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3' GENOMIC PROMOTER REGION AND POLYMERASE GENE 
MUTATIONS RESPONSIBLE FOR ATTENUATION IN VIRUSES 
OF THE ORDER DESIGNATED MONONEGAVIRALES 

5 Field Of The Invention 

This invention relates to isolated/ 
recombinant ly- generated, attenuated, nonsegmented, 
negative -sense, single stranded RNA viruses of the 

10 Order designated Mononegavirales having at least one 

attenuating mutation in the 3 1 genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene. This invention was made with 
Government support under a grant awarded by the Public 

15 Health Service, The Government has certain rights in 

the invention* 

Background Of The Invention 

20 Enveloped, negative- sense, single stranded 

RNA viruses are uniquely organized and expressed. The 
genomic RNA of negative- sense, single stranded viruses 
serves two template functions in the context of a 
nucleocapsid: as a template for the synthesis of 

25 messenger RNAs (mRNAs) and as a template for the 

synthesis of the antigenome ( + ) strand. Negative - 
sense, single stranded RNA viruses encode and package 
their own RNA dependent RNA Polymerase, Messenger RNAs 
are only synthesized once the virus has been uncoated 

30 in the infected cell. Viral replication occurs after 

synthesis of the mRNAs and requires the continuous 
synthesis of viral proteins. The newly synthesized 
antigenome ( + ) strand serves as the template for 
generating further copies of the {-) strand genomic 

35 RNA. 
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The polymerase complex actuates and achieves 
transcription and replication by engaging the cis- 
acting signals at the 3' end of the genome, in 
particular, the promoter region. Viral genes are then 
transcribed from the genome template unidirectionally 
from its 3' to its 5 1 end. There is always less mRNA 
made from the downstream genes (e.g., the polymerase 
gene (L) ) relative to their upstream neighbors (i.e., 
the nucleoprotein gene (N) ) . Therefore, there is always 
a gradient of mRNA abundance according to the position 
of the genes relative to the 3' -end of the genome. 

Based on the revised reclassification in 1993 
by the International Committee on the Taxonomy of 
Viruses, an Order, designated Mononegavirales, has been 
established. This Order contains three families of 
enveloped viruses with single stranded, nonsegmented 
RNA genomes of minus polarity (negative-sense) . These 
families are the Paramyxoviridae , Rhabdoviridae and 
Filoviridae. The family Paramyxoviridae has been 
further divided into two subfamilies, Paramyxovirinae 
and Pneumovirinae . The subfamily Paramyxovirinae 
contains three genera, Paramyxovirus, Rubulavirus and 
Morbillivirus . The subfamily Pneumovirinae contains 
the genus Pneumovirus . 

The new classification is based upon 
morphological criteria, the organization of the viral 
genome, biological activities and the sequence 
relationships of the proteins. The morphological 
distinguishing feature among enveloped viruses for the 
subfamily Paramyxovirinae is the size and shape of the 
nucleocapsids (diameter 18mm, 1mm in length, pitch of 
5.5 nm) , which have a left-handed helical symmetry. The 
biological criteria are: 1) antigenic cross-reactivity 
between members of a genus, and 2) the presence of 
neuraminidase activity in the genera Paramyxovirus , 
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Rubulavirus and its absence in genus Mo^%>±llivirus . In 
addition, variations in the coding potential of the P 
gene are considered, as is the presence of an extra 
gene (SH) in Rubulaviruses . 

Pneumoviruses can be distinguished from 
Paramyxovirinae morphologically because they contain 
narrow nucleocapsids , In addition, pneumoviruses have 
major differences in the number of protein -encoding 
cistrons (10 in pneumoviruses versus 6 in 
Paramyxovirinae) and an attachment protein (G) that is 
very different from that of Paramyxovirinae. Although 
the paramyxoviruses and pneumoviruses have six proteins 
that appear to correspond in function (N, P, M, G/H/HN, 
F and L) , only the latter two proteins exhibit 
significant sequence relatedness between the two 
subfamilies. Several pneumoviral proteins lack 
counterparts in most of the paramyxoviruses, namely the 
nonstructural proteins NS1 and NS2, the small 
hydrophobic protein SH, and a second protein M2 . Some 
paramyxoviral proteins, namely C and V, lack 
counterparts in pneumoviruses . However, the basic 
genomic organization of pneumoviruses and 
paramyxoviruses is the same. The same is true of 
rhabdoviruses and filoviruses. Table 1 presents the 
current taxonomical classification of these viruses, 
together with examples of each genus. 

Table 1 

Classification of Nonsegmented, negative-sense, single 

stranded RNA Viruses of the Order Mononegavirales 
Family Paramvxoviridae 

Subfamily Paramyxovirinae 
Genus Paramyxovirus 

Sendai virus (mouse parainfluenza virus 
type 1) 
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Human parainfluenza virus (PIV) types 1 
and 3 

Bovine parainfluenza virus (BPV) type 3 
Genus Rubulavirus 

Simian virus 5 (SV) (Canine 
parainfluenza virus type 2) 
Mumps virus 

Newcastle disease virus (NDV) (avian 
Paramyxovirus 1 ) 

Human parainfluenza virus types 2, 4a 

and 4b 
Genus Morbi Hi virus 

Measles virus (MV) 

Dolphin Morbillivirus 

Canine distemper virus (CDV) 

Peste-des-petits-ruminants virus 

Phocine distemper virus 

Rinderpest virus 
Subfamily Pneumovirinae 
Genus Pneumovirus 

Human respiratory syncytial virus (RSV) 

Bovine respiratory syncytial virus 

Pneumonia virus of mice 

Turkey rhinotracheitis virus 
Family Rhabdoviridae 

Genus Lyasavlrus 

Rabies virus 
Genus Vesiculovirus 

Vesicular stomatitis virus 
Genus Ephemerovirua 

Bovine ephemeral fever virus 
Family Filovirdae 

Genus Filovirus 

Marburg virus 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCTAJS97/16718 



- 5 - 



For many of these viruses, no vaccines of any 
kind are available. Thus, there is a need to develop 
vaccines against such human and animal pathogens. Such 
vaccines would have to elicit a protective immune 
response in the recipient- The qualitative and 
quantitative features of such a favorable response are 
extrapolated from those seen in survivors of natural 
virus infection, who, in general, are protected from 
reinfection by the same or highly related viruses for 
some significant duration thereafter. 

A variety of approaches can be considered in 
seeking to develop such vaccines, including the use of: 
(1) purified individual viral protein vaccines (subunit 
vaccines) ; (2) inactivated whole virus preparations; 
and (3) live, attenuated viruses. 

Subunit vaccines have the desirable feature 
of being pure, definable and relatively easily produced 
in abundance by various means, including recombinant 
DNA expression methods. To date, with the notable 
exception of hepatitis B surface antigen, viral subunit 
vaccines have generally only elicited short-lived 
and/or inadequate immunity, particularly in naive 
recipients . 

Formalin inactivated whole virus preparations 
of polio (IPV) and hepatitis A have proven safe and 
efficacious. In contrast, immunization with similarly 
inactivated whole viruses such as respiratory syncytial 
virus and measles virus vaccines elicited unfavorable 
immune responses and/or response profiles which 
predisposed vaccinees to exaggerated or aberrant 
disease when subsequently confronted with the natural 
or "wild- type" virus. 

Early attempts (1966) to vaccinate young 
children using a parenterally administered formalin- 
inactivated RSV vaccine. Unfortunately, several field 
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trials of this vaccine revealed serious adverse 
reactions -- the development of a severe illness with 
unusual features following subsequent natural infection 
with RSV (Bibliography entries 1,2). It has been 
5 suggested that this formalinized RSV antigen elicited 

an abnormal or unbalanced immune response profile, 
predisposing the vaccinee to RSV disease (3,4). 

Thereafter, live, attenuated RSV vaccine 
candidates were generated by cold passage or chemical 

10 mutagenesis. These RSV strainB were found to have 

reduced virulence in seropositive adults. 
Unfortunately, they proved either over or under - 
attenuated when given to seronegative infants; in some 
cases, they also were found to lack genetic stability 

15 (5,6). Another vaccination approach using parenteral 

administration of live virus was ineffective and 
efforts along this line were discontinued (7) . 
Notably, these live RSV vaccines were never associated 
with disease enhancement as observed with the formalin - 

20 inactivated RSV vaccine described above. Currently, 

there are no RSV vaccines approved for administration 
to humans, although clinical trials are now in progress 
with cold-passaged, chemically mutagenized strains of 
RSV designated A2 and B-l. 

25 Appropriately attenuated live derivatives of 

wild-type viruses offer a distinct advantage as vaccine 
candidates. As live, replicating agents, they initiate 
infection in recipients during which viral gene 
products are expressed, processed and presented in the 

30 context of the vaccinee* s specific MHC class I and II 

molecules, eliciting humoral and cell-mediated immune 
responses, as well as the coordinate cytokine patterns, 
which parallel the protective immune profile of 
survivors of natural infection. 
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This favorable immune response pattern is 
contrasted with the delimited responses elicited by 
inactivated or subunit vaccines, which typically are 
largely restricted to the humoral immune surveillance 
arm. Further, the immune response profile elicited by 
some formalin inactivated whole virus vaccines, e.g., 
measles and respiratory syncytial virus vaccines 
developed in the 1960's, have not only failed to 
provide sustained protection, but in fact have led to a 
predisposition to aberrant, exaggerated, and even fatal 
illness, when the vaccine recipient later confronted 
the wild- type virus. 

While live, attenuated viruses have highly 
desirable characteristics as vaccine candidates, they 
have proven to be difficult to develop. The crux of 
the difficulty lies in the need to isolate a derivative 
of the wild- type virus which has lost its disease- 
producing potential (i.e., virulence), while retaining 
sufficient replication competence to infect the 
recipient and elicit the desired immune response 
profile in adequate abundance. 

Historically, this delicate balance between 
virulence and attenuation has been achieved by serial 
passage of a wild- type viral^sqlate through different 
host tissues or cells under varying growth conditions 
(such as temperature) . This process presumably favors 
the growth of viral variants (mutants) , some of which 
have the favorable characteristic of attenuation. 
Occasionally, further attenuation is achieved through 
chemical mutagenesis as well. 

This propagation/passage scheme typically 
leads to the emergence of virus derivatives which are 
temperature sensitive, cold-adapted and/or altered in 
their host range one or all of which are changes 
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Acute measles infections in previously 
immunized adolescents and young adults point to an 
additional problem. These secondary vaccine failures 
indicate limitations in the current vaccines' ability 
to induce and maintain antiviral protection that is 
both abundant and long-lived (11,12,13). Recently, yet 
another potential problem was revealed. The 
hemagglutinin protein of wild- type measles isolated 
over the past 15 years has shown a progressively 
increasing distance from the vaccine strains (14) . 
This "antigenic drift" raises legitimate concerns that 
the vaccine strains may not contain the ideal antigenic 
repertoire needed to provide optimal protection. Thus, 
there is a need for improved vaccines. 

Rational vaccine design would be assisted by 
a better understanding of these viruses, in particular, 
by the identification of the virally encoded 
determinants of virulence as well as those genomic 
changes which are responsible for attenuation. 

Summary Of The Invention 

Accordingly, it is an object of this 
invention to identify those regions of the genome of 
the RNA viruses of the Order Mononegavi rales where 
mutations result in attenuation of those viruses. 

It is a further object of this invention to 
produce recombinantly-generated viruses which 
incorporate such attenuating mutations in their 
genomes . 

It is still a further object of this 
invention to formulate vaccines containing such 
attenuated viruses . 

These and other objects of the invention as 
discussed below are achieved by the generation and 
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isolation of recombinantly-generated, attenuated, 
nonsegmented, negative- sense, single stranded RNA 
viruses of the Order Mononegavirales having at least 
one attenuating mutation in the 3 1 genomic promoter 
region and having at least one attenuating mutation in 
the RNA polymerase gene. 

In the case of measles virus, at least one 
attenuating mutation in the 3 ■ genomic promoter region 
is selected from the group consisting of nucleotide 26 
(A — > T) , nucleotide 42 (A — > T or A — > C) and 
nucleotide 96 (G — > A) , where these nucleotides, as 
well as others delineated in this application (unless 
stated otherwise) , are presented in positive strand, 
antigenomic, that is, message (coding) sense, and at 
least one attenuating mutation in the RNA polymerase 
gene is selected from the group consisting of 
nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 331 
(isoleucine ~» threonine) , 140 9 (alanine — > threonine) , 
1624 (threonine —> alanine) , 1649 (arginine — > 
methionine) , 1717 (aspartic acid — > alanine) , 1936 
(histidine — > tyrosine) , 2074 (glutamine — > arginine) 
and 2114 (arginine — > lysine) . 

In the case of human parainfluenza virus type 
3, at least one attenuating mutation in the 3* genomic 
promoter region is selected from the group consisting 
of nucleotide 23 (T -> C) , nucleotide 24 (C — > T) , 
nucleotide 28 (G -> T) and nucleotide 45 (T — » A) , and 
at least one attenuating mutation in the RNA polymerase 
gene is selected from the group consisting of 
nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 942 
(tyrosine -> histidine) , 992 (leucine -> 
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phenylalanine) , 1292 (leucine -> phenylalanine) , and 
1558 {threonine -» isoleucine) . 

In the case of human respiratory syncytial 
virus subgroup B, at least one attenuating mutation in 
5 the 3* genomic promoter region is selected from the 

group consisting of nucleotide 4 (C — > G) and the 
insertion of an additional A in the stretch of A's at 
nucleotides 6-11, and at least one attenuating mutation 
in the RNA polymerase gene is selected from the group 

10 consisting of nucleotide changes which produce changes 

in an amino acid selected from the group consisting of 
residues 353 (arginine -> lysine), 451 (lysine ~» 
arginine) , 1229 (aspartic acid -> asparagine) , 2029 
(threonine -> isoleucine) and 2050 (asparagine 

15 aspartic acid) * 

In another embodiment of this invention, 
attenuated virus is used to prepare vaccines which 
elicit a protective immune response against the wild- 
type form of the virus. 

20 In yet another embodiment of this invention, 

an isolated, positive strand, antigenomic message sense 
nucleic acid molecule (or an isolated, negative strand 
genomic sense nucleic acid molecule) having the 
complete viral nucleotide sequence (whether of wild- 

25 type virus or virus attenuated by non- recombinant 

means) is manipulated by introducing one or more of the 
attenuating mutations described in this application to 
generate an isolated, recombinantly-generated 
attenuated virus. This virus is then used to prepare 

30 vaccines which elicit a protective immune response 

against the wild- type form of the virus - 

In still another embodiment of this 
invention, such a complete wild- type or vaccine viral 
nucleotide sequence is used: (1) to design PCR primers 

35 for use in a PCR assay to detect the presence of the 
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corresponding virus in a sample; or (2) to design and 
select peptides for use in an ELISA to detect the 
presence of the corresponding virus in a sample. 

Brief Description Of The Figures 

Figure 1 depicts the passage history of the 
Edmonston measles virus (15) . The abbreviations have 
the following meanings: HK - human kidney; HA - human 
amnion; CE(am) - chick embryo; CEF - chick embryo 
fibroblast; DK - dog kidney; WI-38 - human diploid 
cells; SK - sheep kidney; * - plaque cloning. The 
number following each abbreviation represents the 
number of passages. 

Figure 2 depicts a map of the measles virus 
genome showing putative cis- acting regulatory elements 
at and near the genome and antigenome termini. Top - a 
schematic map of the measles virus genome, beginning at 
the 3 1 end with 52 nucleotides of leader sequence (1) 
and ending at the 5' terminus with 37 nucleotides of 
trailer sequence (t) . Gene boundaries are denoted by 
vertical bars; below each gene is the number of 
cistronic nucleotides. Bottom - an expanded schematic 
view of the 3 1 extended genomic promoter regions of 
genome and antigenome, showing the position and 
sequence of the two highly conserved domains, A and B. 
The intervening intergenic trinucleotide is denoted as 
well. Nascent 5' RNAs encompassing the A 1 to B' 
regions are presumed to contain the regulatory sequence 
at which the N protein encapsidation initiates. 

Figure 3 depicts a genetic map of the RSV 
subgroup B wild- type strains designated 2B and 18537 
(top portion) , the intergenic sequences of those 
strains (middle portion) and the 68 nucleotide overlap 
between the M2 and L genes (bottom portion) . The RSV 
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2B stain has six fewer nucleotides in the G gene, 
encoding two fewer amino acid residues in the G 
protein, as compared to the 1B537 strain. The 2B 
strain has 145 nucleotides in the 5' trailer region, as 
5 compared to 149 nucleotides in the 18537 strain. The 

2B strain has one more nucleotide in each of the NS-1, 
NS-2 and N genes, and one fewer nucleotide in each of 
the M and F genes, as compared to the 18537 strain. 

10 Detailed Description Of The Invention 

Transcription and replication of negative- 
sense, single stranded RNA viral genomes are achieved 
through the enzymatic activity of a multimeric protein 

15 acting on the ribonucleoprotein core (nucleocapsid) , 

Naked genomic RNA cannot serve as a template. Instead, 
these genomic sequences are recognized only when they 
are entirely encapsidated by the N protein into the 
nucleocapsid structure. It is 'only in that context 

20 that the genomic and antigenomic terminal promoter 

sequences are recognized to initiate the 
transcriptional or replication pathways. 

All paramyxoviruses require the two viral 
proteins, L and P, for these polymerase pathways to 

25 proceed. The pneumoviruses, including RSV, also 

require the transcription elongation factor, M2, for 
the transcriptional pathway to proceed efficiently. 
Additional cof actors may also play a role, including 
perhaps the virus -encoded NS1 and NS2 proteins, as well 

30 as perhaps host-cell encoded proteins. 

However, considerable evidence indicates that 
it is the L protein which performs most, if not all, 
the enzymatic processes associated with transcription 
and replication, including initiation, and termination 

35 of ribonucleotide polymerization, capping and 
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polyadenylation of mRNA transcripts, methylation and 
perhaps specific phosphorylation of P proteins. The L 
protein's central role in genomic transcription and 
replication is supported by its large size, sensitivity 
to mutations, and its catalytic level of abundance in 
the transcriptionally active viral complex (16) . 

These considerations led to the proposal that 
L proteins consist of a linear array of domains whose 
concatenated structure integrates discrete functions 
(17) . Indeed, three such delimited, discrete elements 
within the negative- sense virus L protein have been 
identified based on their relatedness to defined 
functional domains of other well-characterized 
proteins. These include: (1) a putative RNA template 
recognition and/or phosphodiester bond formation 
domain; (2) an RNA binding element; and (3) an ATP 
binding domain. All prior studies of L proteins of 
nonsegmented negative- sense, single stranded RNA 
viruses have revealed these putative functional 
elements (17) . 

Without being bound by the following, it is 
reasonable to presume that these non-protein coding, 
promoter and other c is -acting genomic regulatory 
domains are important determinants of the efficiency 
with which transcription and replication by measles 
virus (MV) and other viruses of the Order 
Mononegavirales are actualized, in association with the 
L protein, and that they may therefore be virulence 
determinants for these viruses as well. 

In summary, the invention is believed to 
encompass a coordinate set of changes between the cis- 
acting regulatory signal (3 1 genomic promoter region) 
and the polymerase gene (L) which results in 
attenuation of the virus while retaining sufficient 
ability of the virus to replicate. Attenuation is 
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optimized by rational mutations of the 3* genomic 
promoter region and the polymerase gene, which provide 
the desired balance of replication efficiency: so that 
the virus vaccine is no longer able to produce disease, 
yet retains its capacity to infect the vaccinee's 
cells, to express sufficiently abundant gene products 
to elicit the full spectrum and profile of desirable 
immune responses, and to reproduce and disseminate 
sufficiently to maximize the abundance of the immune 
response elicited. 

Without being bound by the following, 
attenuating mutations in the extended promoter (3 1 
genomic promoter region) and in the polymerase gene are 
believed to affect the display of cis-acting signals 
and the conformation of the polymerase complex engaging 
these signals. For example, when encapsidated, the 
promoter RNA is coiled in a helical array. Changes in 
promoter sequence may affect the relative positions at 
which the conserved signals are displayed relative to 
one another. Specifically, the measles wild- type 3 1 
genomic promoter region has a pyrimidine (uracil) at 
positions 2 6 and 42 (the antigenomic message sense 
sequences have the purine adenine) . The vaccine 
strains have purines at those positions (the 
antigenomic message sense sequences have the 
corresponding pyrimidines; see Table 3 in Example 1 
below) . The larger purines may change the distance 
and/or angular display between the conserved domains of 
the promoter (e.g, in measles, positions 1*11 and 87- 
98) , resulting in an altered spatial presentation of 
the cis-acting signals to the polymerase. 

Animal studies have demonstrated a decrease 
in viral replication sufficient to avoid illness but 
adequate to elicit the desired immune response. This 
likely represents a decrease in transcription, a 
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decrease in gene expression of virally encoded 
proteins, a decrease in antisense templates and, 
therefore, the production of fewer new genomes. The 
resulting attenuated viruses are significantly less 
5 virulent than the wild- type. 

The attenuating mutations described herein 
may be introduced into viral strains by two methods: 

(1) Conventional means such as chemical 
mutagenesis during virus growth in cell cultures to 

10 which a chemical mutagen has been added, selection of 

virus that has been subjected to passage at suboptimal 
temperature in order to select temperature sensitive 
and/or cold adapted mutations, identification of mutant 
virus that produce small plaques in cell culture, and 

15 passage through heterologous hosts to select for host 

range mutations. These viruses are then screened for 
attenuation of their biological activity in an animal 
model. Attenuated viruses are subjected to nucleotide 
sequencing of their 3 1 genomic promoter region and 

20 polymerase genes to locate the sites of attenuating 

mutations. Once this has been done, method (2) is then 
carried out. 

(2) A preferred means of introducing 
attenuating mutations comprises making predetermined 

25 mutations using site-directed mutagenesis. These 

mutations are identified either by method (1) or by 
reference to closely-related viruses whose attenuating 
mutations are already known. One or more mutations are 
introduced into each of the 3 ' genomic promoter region 

30 and the polymerase gene. Cumulative effects of 

different combinations of coding and non-coding changes 
can also be assessed. 

The mutations to the 3 1 genomic promoter 
region and polymerase gene are introduced by standard 

35 recombinant DNA methods into a DNA copy of the viral 
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genome. This may be a wild- type or a modified viral 
genome background (such as viruses modified by method 
(1)), thereby generating a new virus. Infectious 
clones or particles containing these attenuating 
mutations are generated using the cDNA "rescue" system, 
which has been applied to a variety of viruses, 
including Sendai virus (18) ; measles virus (19) ; 
respiratory syncytial virus (20) ; rabieB (21) ; 
vesicular stomatitis virus (VSV) (15) ; and rinderpest 
virus (23); these references are hereby incorporated by 
reference. See, for measles virus rescue, published 
International patent application WO 97/06270, 
designating the United States (24) ; for PIV-3 rescue, 
U.S. provisional patent application 60/047575 (25); for 
RSV rescue, published International patent application 
WO 97/12032, designating the United States (26); these 
applications are hereby incorporated by reference. 

Briefly, all Mononegavirales rescue systems 
can be summarized as follows: Each requires a cloned 
DNA equivalent of the entire viral genome placed 
between a suitable DNA -dependent RNA polymerase 
promoter (e.g., the T7 RNA polymerase promoter) and a 
self -cleaving ribozyme sequence (e*g., the hepatitis 
delta ribozyme) which is inserted into a propagatable 
bacterial plasmid. This transcription vector provides 
the readily manipulable DNA template from which the RNA 
polymerase (e.g., T7 RNA polymerase) can faithfully 
transcribe a single- stranded RNA copy of the viral 
antigenome (or genome) with the precise, or nearly 
precise, 5' and 3' termini. The orientation of the 
viral genomic DNA copy and the flanking promoter and 
ribozyme sequences determine whether antigenome or 
genome RNA equivalents are transcribed. Also required 
for rescue of new virus progeny are the virus -specific 
trans - acting proteins needed to encapsidate the naked, 
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single- stranded viral antigenome or genome RNA 
transcripts into functional nucleocapsid templates: 
the viral nucleocapsid (N or NP) protein, the 
polymerase-associated phosphoprotein (P) and the 
polymerase (L) protein. These proteins comprise the 
active viral RNA-dependent RNA polymerase which must 
engage this nucleocapsid template to achieve 
transcription and replication. 

The trans -acting proteins required for 
measles virus rescue are the encapsidating protein N, 
and the polymerase complex proteins, P and L. For PIV- 
3, the encapsidating protein is designated NP, and the 
polymerase complex proteins are also referred to as P 
and L. For RSV, the virus -specific trans-acting 
proteins include N, P and L, plus an additional 
protein, M2 , the RSV-encoded transcription elongation 
factor. 

Typically, these viral trans -acting proteins 
are generated from one or more plasmid expression 
vectors encoding the required proteins, although some 
or all of the required trans-acting proteins may be 
produced within mammalian cells engineered to contain 
and express these virus-specific genes and gene 
products as stable transf ormants . 

The typical (although not necessarily 
exclusive) circumstances for rescue include an 
appropriate mammal lian cell milieu in which T7 
polymerase is present to drive transcription of the 
antigenomic (or genomic) single- stranded RNA from the 
viral genomic cDNA- containing transcription vector* 
Either cotranscriptionally or shortly thereafter, this 
viral antigenome (or genome) RNA transcript is 
encapsidated into functional templates by the 
nucleocapsid protein and engaged by the required 
polymerase components produced concurrently from co- 
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transfected expression plasmids encoding the required 
virus -specific trans-acting proteins. These events and 
processes lead to the prerequisite transcription of 
viral mRNASr the replication and amplification of new 
5 genomes and, thereby, the production of novel viral 

progeny, i.e., rescue. 

For the rescue of rabies, VSV and Sendai, T7 
polymerase is provided by recombinant vaccinia virus 
VTF7-3. This system, however, requires that the 

10 rescued virus be separated from the vaccinia virus by 

physical or biochemical means or by repeated passaging 
in cells or tissues that are not a good host for 
poxvirus. For MV cDNA rescue, this requirement is 
avoided by creating a cell line that expresses T7 

15 polymerase, as well as viral N and P proteins. Rescue 

is achieved by transfecting the genome expression 
vector and the L gene expression vector into the helper 
cell line. Advantages of the host -range mutant of the 
vaccinia virus, MVA-T7, which expresses the T7 RNA 

20 polymerase, but does not replicate in mammalian cells, 

are exploited to rescue RSV, Rinderpest virus and MV. 
After simultaneous expression of the necessary 
encapsidating proteins, synthetic full length 
antigenomic viral RNA are encapsidated, replicated and 

25 transcribed by viral polymerase proteins and replicated 

genomes are packaged into infectious virions. In 
addition to such antigenomes, genome analogs have now 
been successfully rescued for Sendai and PIV-3 (25,27) . 

The rescue system thus provides a composition 

30 which comprises a transcription vector comprising an 

isolated nucleic acid molecule encoding a genome or 
antigenome of a nonsegmented, negative -sense, single 
stranded RNA virus of the Order Mononegavirales having 
at least one attenuating mutation in the 3 ' genomic 

35 promoter region and having at least one attenuating 
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mutation in the RNA polymerase gene, together with at 
least one expression vector which comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins necessary for encapsidation, 
transcription and replication (e.g., N, P and L for 
measles virus; NP, P and L for PIV-3; N, P, L and M2 
for RSV) . Host cells are then transformed or 
transfected with the at least two expression vectors 
just described. The host cells are cultured under 
conditions which permit the co- expression of these 
vectors so as to produce the infectious attenuated 
virus • 

The rescued infectious virus is then tested 
for its desired phenotype (temperature sensitivity, 
cold adaptation, plaque morphology, and transcription 
and replication attenuation), first by in vitro means. 
The mutations at the cis-acting 3/ genomic promoter 
region are also tested using the minireplicon system 
where the required trans -acting encapsidation and 
polymerase activities are provided by wild- type or 
vaccine helper viruses, or by plasmids expressing the 
N, P and different L genes harboring gene- specif ic 
attenuating mutations (19,28) . 

If the attenuated phenotype of the rescued 
virus is present, challenge experiments are conducted 
with an appropriate animal model. Non-human primates 
provide the preferred animal model for the pathogenesis 
of human disease. These primates are first immunized 
with the attenuated, recombinant ly-generated virus, 
then challenged with the wild- type form of the virus. 
Monkeys are infected by various routes, including but 
not limited to intranasal, intratracheal or 
subcutaneous routes of inoculation (29) . 
Experimentally infected rhesus and cynomolgus macaques 
have also served as animal models for studies of 
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vaccine -induced protection against measles (30) . 
Protection is measured by such criteria as disease 
signs and symptoms, survival, virus shedding and 
antibody titers. If the desired criteria are met, the 
attenuated, recombinantly-generated virus is considered 
a viable vaccine candidate for testing in humans. The 
"rescued" virus is considered to be "recombinantly- 
generated", as are the progeny and later generations of 
the virus, which also incorporate the attenuating 
mutations . 

Even if a "rescued virus is underattenuated 
or overattenuated relative to optimum levels for 
vaccine use, this is information which is valuable for 
developing such optimum strains. 

Optimally, a codon containing an attenuating 
point mutation may be stabilized by introducing a 
second or a second plus a third mutation in the codon 
without changing the amino acid encoded by the codon 
bearing only the attenuating point mutation. 
Infectious virus clones containing the attenuating and 
stabilizing mutations are also generated using the cDNA 
"rescue" system described above. 

Measles virus serves as a useful model for 
this invention, because sequence data are now available 
as described herein for the disease -causing wild- type 
virus and for the disease-preventing vaccines which 
have a demonstrated history of efficacy. 

Measles virus was first isolated in tissue 
culture in 1954 (31) from an infected patient named 
David Edmonston. This Edmonston strain of measles 
became the progenitor for many live -attenuated measles 
vaccines including Mora ten, which is the current 
vaccine in the United States (Attenuvax 1M ; Merck Sharp & 
Dohme, West Point, PA) and was licensed in 1968 and has 
proven to be efficacious. 
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Aggressive immunization programs instituted 
in the mid to late 1960s resulted in the precipitous 
drop in reported measles cases from near 700,000 in 
1965 to 1500 in 1983. In parallel, other vaccine 
strains were also developed from the Edmonston strain 
(see Fig, 1), Schwarz (Institut Merieux, Lyon, France), 
Zagreb (Zagreb, Yugoslavia) and AIK-C (Japan) . These 
other vaccines have also proven to be efficacious and 
have been used extensively. An early, reactogenic, 
underattenuated vaccine strain (Rubeovax tM : Merck Sharp 
& Dohme) produced measles-like illness in children and 
its use thus was discontinued. It, however, was 
further attenuated successfully to produce the Moraten 
vaccine strain (see Fig. 1) (32) . Live measles virus 
vaccine provides a success story of the development of 
an efficacious vaccine and provides a model for 
understanding the molecular mechanisms of viral vaccine 
attenuation among nonsegmented, negative- sense, single 
stranded RNA viruses. 

Because of its significance as a major cause 
of human morbidity and mortality, measles virus (MV) 
has been quite extensively studied. MV is a large, 
relatively spherical, enveloped particle composed of 
two compartments, a lipoprotein membrane and a 
ribonucleoprotein particle core, each having distinct 
biological functions (33) . The virion envelope is a 
host cell-derived plasma membrane modified by three 
virus- specif ied proteins: The hemagglutinin (H; 
approximately 80 kilodaltons (kD) ) and fusion {F 1 2 ; 
approximately 60 kD) glycoproteins project on the 
virion surface and confer host cell attachment and 
entry capacities to the viral particle (16) . 
Antibodies to H and/or F are considered protective 
since they neutralize the virus' ability to initiate 
infection (34,35,36). The matrix (M; approximately 37 
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kD) protein is the amphipathic protein lining the 
membrane's inner surface, which is thought to 
orchestrate virion morphogenesis and thus consummate 
virus reproduction (37) . The virion core contains the 
5 15,894 nucleotide long genomic RNA upon which template 

activity is conferred by its intimate association with 
approximately 2600 molecules of the approximately 60 kD 
nucleocapsid (N) protein (38,39,40). Loosely 
associated with this approximately one micron long 

10 helical ribonucleoprotein particle are enzymatic levels 

of the viral RNA dependent RNA polymerase (L; 
approximately 240 kD) which in concert with the 
polymerase cof actor (P; approximately 70 kD) , and 
perhaps yet other virus -specified as well as 

15 host -encoded proteins, transcribes and replicates the 

MV genome sequences (41) . 

To date, the entire nucleotide sequences 
(only for the Edmonston B laboratory strain and the 
AIK-C vaccine strain) , coding potential, and 

20 organization of the MV genome have been reported (33) . 

The six virion structural proteins are encoded by six 
contiguous, non- overlapping genes which are arrayed as 
follows: 3 ' -N-P-M-F-H-L-5 • . Two additional MV gene 
products of as yet uncertain function have also been 

25 identified. These two nonstructural proteins, known as 

C (approximately 20 kD) and V (approximately 45 kD) , 
are both encoded by the P gene, the former by a second 
reading frame within the P mRNA; the latter by a 
cotranscriptionally edited P gene-derived mRNA which 

30 encodes a hybrid protein having the amino terminal 

sequences of P and a new zinc finger-like cysteine-rich 
carboxy terminal domain (16) . 

In addition to the sequences encoding the 
virus-specif ied proteins, the MV genome contains 

35 distinctive non-protein coding domains resembling those 
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directing the transcriptional and replicative pathways 
of related viruses (16,42). These regulatory signals 
lie at the 3 1 and 5' ends of the MV genome and in short 
internal regions spanning each intercistronic boundary. 
The former encode the putative promoter *and/or 
regulatory sequence elements directing genomic 
transcription, genome and antigenome encapsidation, and 
replication. The latter signal transcription 
termination and polyadenylation of each monocistronic 
viral mRNA and then reinitiation of transcription of 
the next gene. In general, the MV polymerase complex 
appears to respond to these signals much as the 
RNA-dependent RNA polymerases of other non- segmented 
negative strand RNA viruses (16,42,43,44). 

Transcription initiates at or near the 3 • end 
of the MV genome and then proceeds in a 5 1 direction 
producing monocistronic mRNAs (40,42,45) . As the 
polymerase traverses the MV genomic template, it 
encounters putative stop/start signals which, in 3 1 to 
5« order, are: a semi-conserved transcription 
termination/polyadenylation signal (A/G U/C UA A/U NN 
A 4 , where N may be any of the four bases) at which each 
monocistronic RNA is completed; a non- transcribed 
intergenic trinucleotide punctuation mark (CUU; except 
at the H:L boundary where it is CGU) ; and a 
semiconserved start signal for transcription initiation 
of the next gene (AGG A/G NN C/A A A/G G A/U, where N 
may be any of the four bases) (45,46) . Since some 
polymerase complexes fail to reinitiate, the abundance 
of each MV mRNA diminishes in parallel with the 
distance of the encoding gene from the genomic 3 1 end. 
This mRNA gradient directly corresponds to the relative 
abundance of each virus -specified protein. This 
indicates that MV protein expression is ultimately 
controlled at the transcriptional level (44) . 
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The 3 ' and 5 1 MV genomic termini contain 
non-protein coding sequences with distinct parallels to 
the leader and trailer RNA encoding regions of VSV 
(42) . Nucleotides 1-55 define the region between the 
5 genomic 3 1 terminus and the beginning of the N gene, 

while 37 additional nucleotides can be found between 
the end of the L gene and the 5 • terminus of the 
genome. However, unlike VSV, or even the 
paramyxoviruses Sendai and NDV, MV does not transcribe 

10 these terminal regions into short, unmodified ( + ) or 

(-) sense leader RNAs (47,48,49). Instead, leader 
readthrough transcripts, including full-length 
polyadenylated leader:N, leader:N:P, leader :N: P :M, and 
of course full-length antigenome MV RNAs are 

15 transcribed (48,49) • Thus, the short leader 

transcript, the key operational element determining the 
switch from transcription to replication of the VSV 
single- stranded, negative polarity genome (50,51,52), 
seems absent in MV. This leads to consideration and 

20 exploration of alternative models for this crucial 

reproductive event (42) . 

Measles virus, as well as all other 
Mononegavirales except the rhabdoviruses, appears to 
have extended its terminal regulatory domains beyond 

25 the confines of leader and trailer encoding sequences 

(42). For measles, these regions encompass the 107 3 1 
genomic nucleotides (the "3 1 genomic promoter region", 
also referred to as the "extended promoter", which 
comprises 52 nucleotides encoding the leader region, 

30 followed by three intergenic nucleotides, and 52 

nucleotides encoding the 5' untranslated region of N 
mRNA) and the 109 5 1 end nucleotides (69 encoding the 
3 ' untranslated region of L mRNA, the intergenic 
trinucleotide and 37 nucleotides encoding the trailer) . 

35 Within these 3 1 terminal approximately 100 nucleotides 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCtD. <WO 9813501 A2> 



3 

WO 98/13501 



3 

PCT/US97/16718 



- 26 - 



of both the genome and antigenome are two short regions 
of shared nucleotide sequence: 14 of 16 nucleotides at 
the absolute 3 1 ends of the genome and antigenome are 
identical. Internal to those termini, an additional 
5 region of 12 nucleotides of absolute sequence identity 

have been located. Their position at and near the 
sites at which the transcription of the MV genome must 
initiate and replication of the antigenome must begin, 
suggests that these short unique sequence domains 

10 encompass an extended promoter region. 

These discrete sequence elements may dictate 
alternative sites of transcription initiation the 
internal domain mandating transcription initiation at 
the N gene start site, and the 3 1 terminal domain 

15 directing antigenome production (42,48,53). In 

addition to their regulatory role as cis-acting 
determinants of transcription and replication, these 3* 
extended genomic and antigenomic promoter regions 
encode the naBcent 5 Vends of antigenome and genome 

20 RNAs, respectively. Within these nascent RNAs reside 

as yet unidentified signals for N protein nucleation, 
another key regulatory element required for 
nucleocapsid template formation and consequently for 
amplification of transcription and replication. Figure 

25 2 schematically shows the location and sequence of 

these highly conserved, putative cis-acting regulatory 
domains . 

Terminal non-protein coding regions similar 
in location, size and spacing are present in the 

30 genomes of other members of the genus Parauryxoviridae , 

though only 8-11 of their absolute terminal nucleotides 
are shared by MV (42,54). The genomic terminii of the 
Aforbillivirus canine distemper virus (CDV) displays a 
greater degree of homology with its MV relative: 73% 

35 of the nucleotides of the leader and trailer sequences 
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of these two viruses are identical, including 16 of 18 
at the absolute 3' termini and 17 of 18 at their 5' 
ends (55) . No accessory internal CDV genomic domain- 
sharing homology to that of the MV extended promoter 
has been found* However, there is a 20 nucleotide long 
stretch lying between CDV genomic nucleotides 85 and 
104 and 15,587 and 15,606 in which 15 of the 20 
nucleotides are complementary (Gene Bank accession 
number AF 14953) . This indicates that CDV, like MV 
contains an additional region within its non-coding 3' 
genomic and antigenomic ends that may provide important 
cis-acting promoter and/or regulatory signals (55) . 

Additionally, the precise length of the 3*- 
leader region (55 nucleotides) is identical among 
several members of the Family Paramyxoviridae (MV, CDV, 
PIV-3, BPV-3, SV and NDV) . Further evidence for the 
importance of these extended, non -protein coding 
regions comes from analyses of a large number of 
distinct copy-back Defective Interfering Viruses (DIs) 
recently cloned from subacute sclerosing 
panencephalitis (SSPE) brain tissue. No DI with a stem 
shorter than the 95 5 1 terminal genomic nucleotides was 
found. This indicates that the minimal signals needed 
for MV DI RNA replication and encapsidation extend well 
beyond the 37 nucleotide long trailer sequence to 
encompass the additional internal putative regulatory 
domain (56) . 

As exemplified in part by measles virus, this 
invention is directed to the concept that important 
virulence/attenuation determinants reside in viral 
genomic non-protein coding regulatory regions and in 
the transacting transcription/replication enzyme 
complex with which these cis-acting elements must 
interact. The cis-acting domains are found both at the 
3 1 and 5' ends of the MV genome, flanking the six 
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contiguous genes encoding viral structural proteins; 
and within the MV genome as short regions encompassing 
internal intergenic boundaries. The former encode the 
putative promoter and/or regulatory sequence elements 
directing the vital processes of genomic transcription, 
genome and antigenome encapsidation, and replication. 
The latter signal transcription termination and 
polyadenylation of each monocistronic viral mRNA and 
then reinitiation of transcription of the next gene. 
The transcription/replication enzyme, RNA dependent RNA 
polymerase molecule can modulate transcription and/or 
replicative efficiency, thereby determining the 
abundance of cytopathic viral gene products and/or 
virion progeny. 

Proof of the concept of this invention for 
measles virus is obtained by first determining the 
nucleotide sequences of the non-coding regulatory 
regions (3 1 genomic promoter region) and the coding 
regions of the L gene (with predicted amino acid 
sequences) of the progenitor Edmonston wild- type MV 
isolate, together with available measles vaccine 
strains derived from this isolate (see Figure 1) . 
Independent other wild- type isolates were examined for 
comparative purposes as well . 

The nucleotide sequences (in positive strand, 
antigenomic, message sense) of four wild- type and five 
vaccine measles strains, as well as the deduced amino 
acid sequences of the RNA polymerase (L protein) of 
these measles viruses, are set forth as follows with 
reference to the appropriate SEQ ID NOS. contained 
herein: 
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Virus Nucleotide Sequence L Protein Sequence 
Wild-Typ e 

Edmonston SEQ ID N0:1 SEQ ID NO: 2 

1977 SEQ ID NO: 3 SEQ ID NO: 4 

1983 SEQ ID NO: 5 SEQ ID NO: 6 

Monte fiore SEQ ID NO: 7 SEQ ID NO: 8 



Vaccine 

Rubeovax™ SEQ ID NO: 9 SEQ ID NO: 10 

Mora ten SEQ ID NO: 11 SEQ ID NO: 12 

Zagreb SEQ ID NO: 13 SEQ ID NO: 14 

AIK-C SEQ ID NO: 15 SEQ ID NO: 16 



Each measles virus genome listed above is 
15,894 nucleotides in length. Translation of the L 
gene starts with the codon at nucleotides 9234-9236; 
the translation stop codon is at nucleotides 15783- 
15785. The translated L protein is 2,183 amino acids 
long. 

Note that nucleotide 2499 of 1983 wild- type 
measles virus is indicated as ,% G" in SEQ ID NO: 5. In 
fact, the base is actually a mixture of "G" and W C" . 
Also note that nucleotide 2143 of Rubeovax™ vaccine 
virus is indicated as W T" in SEQ ID NO: 9. In nine 
clones sequenced, this base was M T" in seven and W C" in 
two; thus, this base can be W T" or M C" . 

In addition, the Schwarz vaccine virus genome 
is identical to that of the Moraten vaccine virus 
genome (SEQ ID NO:ll), except that at nucleotides 4917 
and 4924, Schwarz has a W C" instead of a tt T" . 

Nucleotide differences distinguishing the 3 1 
genomic promoter region and nucleotide and amino acid 
differences distinguishing the L gene and L protein 
sequences of the Edmonston wild- type isolate, vaccine 
strains and other independently isolated wild- type 
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viruses were then compared and aligned (see Tables 3-5 
in Example 1 below) . 

As shown in Table 3, there were three 
mutations from the 3* genomic promoter region (in 
antigenomic, message sense) of the progenitor wild- type 
MV isolate and the derivative vaccine strains: At 
nucleotide position 26, from "A" to "T"; at position 
42, from "A" to "C" or from W A" to "T" ; and in the case 
of Zagreb only, at position 96, from "G" to "A". In 
addition, the other examined wild^type isolates 
differed from both the progenitor wild- type isolate and 
the vaccine strains at position 50 by having "A" 
instead of "G" . 

The predicted amino acid sequences of the L 
genes of measles vaccine strains (Rubeovax™, Moraten, 
Schwarz, AIK-C and Zagreb) and wild- type isolates 
(1977, 1983 and Montefiore) , differ from the progenitor 
strain (Edmonston) at 49 positions in the 2183 amino 
acid long open reading frame (see Tables 4 and 5 in 
Example 1 below) . 

These amino acid differences can be divided 
into four categories: 

(1) Positions where one vaccine strain 
differs from the progenitor, as well as from other 
vaccine and wild- type strains, suggesting a potential 
attenuation site, 

(2) Specific differences between all wild- 
type and all vaccine sequences; these may also 
constitute important attenuation sites. 

3) Residues where chronologically newer wild- 
types differ from older wild- types; which may be 
attributable to genetic drift. 

(4) Positions where one or more vaccine 
strains and/or wild- type strains have common amino 
acids and differ from all the other strains; these 
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changes may represent lineage - speci f ic , potentially 
attenuating changes within the vaccine strains and 
relatedness among the wild- type isolates, respectively. 

There were four category (1) changes where 
one vaccine differed from the other vaccines, as well 
as the wild- type strains. Two of these were in Moraten 
and Schwarz (amino acids 331 and 2114) and two were in 
AIK-C (1624 and 2074) . These mutations are of special 
interest because all of these viruses are good 
vaccines. Thus, these positions are sites for 
attenuation. 

Only one position, 1717, fits into category 
(2), with all wild-types having aspartic acid and all 
vaccines having alanine. Interestingly, this position 
is in one of two areas where the L genes of measles and 
canine distemper virus (which are otherwise highly 
homologous) do not show exceptional conservation. This 
difference makes it more likely that 1717 is a key 
position for an attenuating mutation in measles. 

There were five positions, 149, 636, 720, 
2017 and 2119, where both chronologically newer wild- 
types (1983 and Montefiore) differ from older wild- 
types (Edmonston and 1977), which therefore fit into 
category (3) . These differences suggest genetic drift 
rather than denoting sites of attenuating mutations. 
Not included in this total are 16 positions where 
Montefiore (the 1989 isolate) differed from the rest 
(see Table 5) . These could be either genetic drift 
(category (3)) or random change (category (4)). The 
remaining 23 positions are category (4), with one or 
more of the viruses differing from the consensus. 

Three of these positions (1409, 1649, 1936) 
are potentially attenuating category (4) mutations. 
These are changes where two vaccine strains have a 
common change from the progenitor wild- type strain. 
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These changes may be connected with the vaccine lineage 
leading to the Rubeovax™ and Mora ten vaccines (Figure 
1) . 

Applicants have found that their AIK-C 
vaccine strain nucleotide sequence differs from the 
published sequence (3 3) at 21 positions, including one 
insertion and one deletion. Several of these 
differences result in coding changes including two in 
the L gene (at amino acids 1477 and 2008) . 

Thus, the additional changes accrued within 
the Ii gene sequence as the measles progenitor strain is 
progressively attenuated to achieve a replicative 
capacity optimized for live vaccine purposes appears to 
be constrained and delimited. Presumably, this limited 
tolerance in the number and location of L gene changes 
is imposed not only by the need to preserve the 
multifunctional capacities of the polymerase, but also 
by the preexisting 3' promoter changes with which the 
evolving L protein must interact to achieve 
transcription and replication. In other words, optimal 
virus attenuation requires coordinate (i.e., linked) 
changes in the polymerase protein and the cis- acting 
regulatory elements on which it acts. 

The 3 1 -leader displays the least tolerance 
for change, allowing highly selected changes during the 
attenuation process at nucleotide position 26 (always 
the change of from n A M to "T"), and at position 42 (the 
change of from "A" to "C" or from W A" to "T") (in 
antigenomic, message sense) . In the case of Zagreb 
only, there is a single further change, from "G n to tt A" 
at position 96, which may be important when combined 
with Zagreb h gene- specif ic changes. The 3 1 -leader 
region seems to have undergone only one instance of 
genetic drift since 1954, with a change of "G n to "A" 
at position 50 (see Table 3) . 
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The net change in the 3' genomic promoter 
region during the attenuation process is the 
replacement of two pyrimidines by two purines in 
genomic sense in all MV vaccine strains. The co- 
5 evolution of the L gene during these attenuation 

processes is believed to reflect selection of subtle 
changes favoring reproduction of the viruses in 
different host cells. All the vaccine strains were 
grown in chick embryo (CE) or chick embryo fibroblast 

10 (CEF) cells during their attenuation process (Figure 

1) . In addition, some vaccine strains have been 
exposed to unique host cells; i.e., Zagreb vaccine was 
grown in dog kidney cells and human diploid cells, 
while the AIK-C vaccine was adapted to sheep kidney 

15 cells* Moraten and Rubeovax™ were exclusively 

developed in CE and CEF, 

Some of the lineage- specif ic L gene changes 
(position 1649 in Rubeovax™, Moraten and Schwarz 
vaccines and the change at position 1717 in all 

20 vaccines) represent a subset of adaptations of the L 

gene to the 3 '-leader to modulate the 
transcription/replication processes for vaccine 
attenuation. Additionally, individual vaccine- specif ic 
changes (category (1) ) may provide additional fine tune 

25 modulation of virus replication/ transcription for each 

vaccine strain. 

Based on Table 3 and the foregoing 
discussion, the key attenuating mutations for the MV 3' 
genomic promoter region are nucleotide 26 (A -» T) , 

30 nucleotide 42 (A T or A ~> C) and nucleotide 96 (G -> 

A) (in antigenomic, message sense) . 

Based on Table 4 and the foregoing 
discussion, the key attenuating sites for the L protein 
are as follows: amino acid residues 331 (isoleucine -> 

35 threonine) , 1409 (alanine -> threonine) , 1624 



BNSDOCID:<WO 9813501A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCI7US97/16718 



- 34 - 



(threonine -> alanine), 1649 (arginine -> methionine), 
1717 (aspartic acid -» alanine), 1936 (histidine — ► 
tyrosine), 2074 (glutamine -» arginine) and 2114 
(arginine -> lysine) . It is understood that the 
nucleotide changes responsible for these amino acid 
changes are not limited to those set forth in Table 4 
of Example 1 below; all changes in nucleotides which 
result in codons which are translated into these amino 
acids are within the scope of this invention. 

Human parainfluenza virus type 3 (HPIV-3) is 
another nonsegmented, negative-sense, single stranded 
enveloped RNA virus. HPIV-3 belongs to the Family 
Paramyxoviridae (see Table 1) . The genome of HPIV-3 is 
15,462 nucleotides long and encodes six non- overlapping 
protein- encoding genes (57) . Five of the genes encode 
a single virion structural protein each, which are 
designated NP (corresponding to the N protein of MV) , 
M # F, HN (hemagglutinin-neuraminidase) and L. The 
sixth mRNA encodes the P protein, and by an overlapping 
5' proximal open reading frame (ORF) encodes the C 
protein, and by the RNA editing mechanism, also encodes 
the D protein. 

Like MV, HPIV-3 consists of a 3 1 -nonprotein 
coding leader region of 55 nucleotides, but unlike 
measles (where it is 37 nucleotides) , it has a 44 
nucleotide long 5' -trailer region. The polymerase 
transcribes the genome in a linear, sequential, start- 
stop manner which is guided by transcription signals in 
the RNA template. 

Attempts to develop a live attenuated HPIV-3 
vaccine by passaging the wild- type virus JS strain 
through cell culture at sub-optimal temperature has 
produced promising results (7,57). Several "cold 
passage" (cp) mutants were isolated for evaluation from 
different passage levels of the JS strain. One such 
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mutant resulted from 45 serial passages and was 
designated cp45. 

This virus exhibited three interesting 
properties: (1) cold adaptation (ca) : the ability to 
5 replicate efficiently at the suboptimal temperature of 

20°C; (2) temperature sensitivity (ts) : inability to 
replicate in vitro at temperatures greater than or 
equal to 39°C; and (3) small plaque morphology. This 
mutant appeared to be a promising vaccine candidate 

10 because: (a) its ca, ts and small plaque phenotype is 

stable after passage in cell culture; (b) its 
replication is restricted in both the upper and lower 
respiratory tract of hamsters; and (c) it induced 
significant protection in hamsters against subsequent 

15 challenge with wild-type HPIV-3 (58,59). 

Evaluation of this strain in the rhesus 
monkey showed the attenuation mutations in cp45 to be a 
combination of ts and non-ts mutations (60). 
Subsequent evaluation in chimpanzees indicated that 

20 cp45 appeared to be satisfactorily attenuated while 

still able to induce a high level of protection against 
wild- type virus challenge (61) . Later preliminary 
clinical evaluation of cp45 in seronegative human 
infants and small children suggested that this 

25 candidate vaccine strain is suitably infectious and 

attenuated, as well as being moderately immunogenic 
(61) . 

The cp45 strain has been grown in both fetal 
rhesus lung (FRhL) and Vero cells as follows: The PIV- 

30 3 cp45 virus grown in FRhL cells was prepared by 

inoculating confluent FRhL cell monolayers in tissue 
culture flasks at an MOI 0.1-1.0. The infected cell 
cultures were fed with EMEM medium and incubated at 
32°C. About seven days later, when maximal cytopathic 

35 effects (synctyia) were observed, the virus was 
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harvested by subjecting the cultures to one freeze- thaw 
cycle, pooling the fluids and then storing the virus at 
-70 °C. 

The PIV-3 cp45 virus grown in Vero cells was 
5 prepared by inoculating with virus a bioreactor culture 

of confluent monolayers of Vero cells on microcarrier 
beads which was continuously stirred. The infected 
bioreactor culture was maintained at 3 0°C. The virus 
was harvested 4-5 days later when syncytial CPE was 

10 observed. The culture fluid containing the virus was 

stored at -70 °C. 

The nucleotide sequences (in positive strand, 
antigenomic, message sense) of the HPIV-3 JS wild- type 
strain (89) and the cp45 vaccine strain grown in FRhL 

15 and Vero cells, as well as the deduced amino acid 

sequences of the RNA polymerase (L protein) of these 
HPIV-3 viruses, are set forth as follows with reference 
to the appropriate SEQ ID NOS . contained herein: 

20 Virus Nucleotide Sequence L Protein Sequence 

Wild-Type 

JS SEQ ID NO: 17 SEQ ID NO: 18 

Vaccine 

25 FRhL cp45 SEQ ID NO: 19 SEQ ID NO: 2 0 

Vero cp45 SEQ ID NO: 21 SEQ ID NO: 22 

Each PIV-3 virus genome listed above is 
15,462 nucleotides in length. Translation of the L 
30 gene starts with the codon at nucleotides 8646-8648; 

the translation stop codon is at nucleotides 15345- 
15347. The translated h protein is 2,233 amino acids 
long. 

As detailed in Example 2 and Table 6 therein 
35 below, based upon the differences between the wild-type 



BNSDOCID: <W0 9813501A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



JS strain and the FRhL-grown cp 45 mutant vaccine 
strain, the key attenuating mutations for the HPIV-3 3 1 
genomic promoter region are nucleotide 23 (T -» C) , 
nucleotide 24 (C — » T) , nucleotide 28 (G -»T) and 
5 nucleotide 45 (T — > A) (in antigenomic, message sense). 

As also detailed in Example 2 and Table 6 therein 
below, key attenuating sites for the L protein of HPIV- 
3 include the following: amino acid residues 942 
(tyrosine -> histidine) , 992 (leucine -> phenylalanine) 

10 and 1558 (threonine — » isoleucine) . 

In addition, the Vero-grown cp45 mutant 
vaccine strain contains an additional mutation 
resulting from a coding change in the L gene at amino 
acid residue 1292 (leucine -> phenylalanine) . 

15 It is understood that the nucleotide changes 

responsible for these amino acid changes are not 
limited to those set forth in Example 2 below; all 
changes in nucleotides which result in codons which are 
translated into these amino acids are within the scope 
20 of this invention. 

Human respiratory syncytial virus (RSV) is 
yet another nonsegmented, negative -sense, single 
stranded enveloped RNA virus. RSV belongs to the 
Subfamily Pneumovirinae and the genus Pneumovirus (see 
25 Table 1) . 

Two major subgroups of human RSV, designated 
A and B, have been identified based on reactivities of 
the F and G surface glycoproteins with monoclonal 
antibodies (62) . More recently, the A and B lineages 
30 of RSV strains have been confirmed by sequence analysis 

(63,64). Bovine, ovine, and caprine strains of this 
virus have also been isolated. The host specificity of 
the virus is most clearly associated with the G 
attachment protein, which is highly divergent between 
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protein and SH proteins suggest that immune pressure 
may drive virus evolution. 

In mouse and cotton rat models, both the F 
and G proteins of RSV elicit neutralizing antibodies 
and immunization with these proteins alone provides 
longterm protection against reinfection (67,68). 

In humans, complete immunity to RSV does not 
develop and reinfections occur throughout life (69,70); 
however, there is evidence that immune factors will 
protect against severe disease* A decrease in severity 
of disease is associated with two or more prior 
infections and there is evidence that children infected 
with one of the two major RSV subgroups may be somewhat 
protected against reinfection with the homologous 
subgroup (71) , observations which suggest that a live 
attenuated virus vaccine may provide protection 
sufficient to prevent serious morbidity and mortality. 
Infection with RSV elicits both antibody and cell 
mediated immunity. Serum neutralizing antibody to the 
F and G proteins has been associated, in some studies, 
with protection from LRD, although reduction in upper 
respiratory disease (URD) has not been demonstrated. 
High levels of serum antibody in infants is associated 
with protection against LRD, and adminstration of 
intravenous immunoglobulin with high RSV neutralizing 
antibody titers has been shown to protect against 
severe disease in high risk children (70,72,73). The 
role of local immunity, and nasal antibody in 
particular, is being investigated. 

The RSV virion consists of a 
ribonucleoprotein core contained within a lipoprotein 
envelope. The virions of pneumoviruses are similar in 
size and shape to those of all other paramyxoviruses . 
When visualized by negative staining and electron 
microscopy, virions are irregular in shape and range in 
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diameter from 150-300 nm (74) . The nucleocapsid of 
this virus is a symmetrical helix similar to that of 
other paramyxoviruses, except that the helical diameter 
is 12-15 nm rather than 18nm. The envelope consists of 
a lipid bilayer that is derived from the host membrane 
and contains virally coded transmembrane surface 
glycoproteins. The viral glycoproteins mediate 
attachment and penetration and are organized separately 
into virion spikes. All members of paramyxovirus 
subfamily have hemagglutinating activity, but this 
function is not a defining feature for pneuinoviruses, 
being absent in RSV but present in PVM (75) . 
Neuraminidase activity is present in members of the 
genera Paramyxovirus, Rubulavirus, and is absent in 
Morbillivirus and Pneumovirus of mice (PVM) (75) . 

RSV possesses two subgroups, designated A and 
B. The wild- type RSV (strain 2B) genome is a single 
strand of negative -sense RNA of 15,218 nucleotides (SEQ 
ID NO: 23) that are transcribed into ten major 
subgenomic mRNAs. Each of the ten mRNAs encodes a 
major polypeptide chain: Three are transmembrane 
surface proteins (G, F and SH) ; three are the proteins 
associated with genomic RNA to form the viral 
nucleocapsid (N, P and L) ; two are nonstructural 
proteins (NS1 and NS2) which accumulate in the infected 
cells but are also present in the virion in trace 
amounts and may play a role in regulating transcription 
and replication; one is the nonglycosylated virion 
matrix protein (M) ; and the last is M2, another 
nonglycosylated protein recently shown to be an RSV- 
specified transcription elongation factor (see Figure 
3) * These ten viral proteins account for nearly all of 
the viral coding capacity. 

The viral genome is encapsidated with the 
major nucleocapsid protein (N) , and is associated with 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 41 - 



the phosphoprotein (P) , and the large (L) polymerase 
protein. These three proteins have been shown to be 
necessary and sufficient for directing RNA replication 
of cDNA encoded RSV minigenomes (76) . Further studies 
have shown that for transcription to proceed with full 
processing/ the M2 protein (ORF 1) is required (74) . 
When the M2 protein is missing, truncated transcripts 
predominate, and rescue of the full length genome does 
not occur (74) . 

Both the M (matrix protein) and the M2 
proteins are internal virion-associated proteins that 
are not present in the nucleocapsid structure. By 
analogy with other nonsegmented negative- stranded RNA 
viruses, the M protein is thought to render the 
nucleocapsid transcriptionally inactive before 
packaging and to mediate its association with the viral 
envelope. The NS1 and NS2 proteins have only been 
detected in very small amounts in purified virions, and 
at this time are considered non- structural . Their 
functions are uncertain, though they may be regulators 
of transcription and replication. Three transmembrane 
surface glycoproteins are present in virions: G, F, and 
SH. G and F (fusion) are envelope glycoproteins that 
are known to mediate attachment and penetration of the 
virus into the host cell. In addition, these 
glycoproteins represent major independent immunogens 
(77) . The function of the SH protein is unknown, 
although a recent report has implicated its involvement 
in the fusion function of the virus (78) . 

The genomes of two wild- type RSV subgroup B 
strains (2B and 18537) have now been sequenced in their 
entirety (see SEQ ID NOS:23 and 25, discussed below). 
Genomic RNA is neither capped nor polyadenylated (79) . 
In both the virion and intracellularly, genomic RNA is 
tightly associated with the N protein. 
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The 3* end of the genomic RNA consists of a 
44 -nucleotide extragenic leader region that is presumed 
to contain the major viral promoter (Fig. 3) . The 3' 
genomic promoter region is followed by ten viral genes 
in the order 3 ' -NS1-NS2-N-P-M-SH-G-F-M2-L-5 1 (Fig- 3). 
The L gene is followed by a 145-149 nucleotide 
extragenic trailer region {see Figure 3) . Each gene 
begins with a conserved nine-nucleotide gene start 
signal 3 • - GGGGCAAAU (except for the ten-nucleotide gene 
start signal of the L gene, which is 3 1 -GGGACAAAAU; 
differences underlined) . For each gene, transcription 
begins at the first nucleotide of the signal. Each 
gene terminates with a semi -conserved 12-14 nucleotide 
gene end (3' -A G U/G U/A ANNN U/A A 3 . 5 ) (where N can be 
any of the four bases) that directs transcription 
termination and polyadenylation (Fig. 3) . The first 
nine genes are non- overlapping and are separated by 
intergenic regions that range in size from 3 to 56 
nucleotides for RSV B strains (Fig. 3) . The intergenic 
regions do not contain any conserved motifs or any 
obvious features of secondary structure and have been 
shown to have no influence on the preceding and 
succeeding gene expression in a minreplicon system 
(Fig. 3) . The last two RSV genes overlap by 68 
nucleotides (Fig. 3) . The gene-start signal of the L 
gene is located inside of, rather than after, the M2 
gene. This 68 nucleotide overlap sequence encodes the 
last 68 nucleotides of the M2 mRNA (exclusive of the 
Poly-A tail), as well as the first 68 nucleotides of 
the L mRNA. 

Ten different species of subgenomic 
polyadenylated mRNAs and a number of polycistronic 
polyadenylated read- through transcripts are the 
products of genomic transcription (74) . 

Transcriptional mapping studies using ITV light mediated 
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genomic inactivation showed that RSV genes are 
transcribed in their 3' to 5* order from a single 
promoter near the 3' end (80). Thus, RSV synthesis 
appears to follow the single entry, sequential 
transcription model proposed for all Mononegavirales 
(16,81). According to this model, the polymerase (L) 
contacts genomic RNA in the nucleocapsid form at the 3 f 
genomic promoter region and begins transcription at the 
first nucleotide. RSV mRNAs are co- linear copies of 
the genes, with no evidence of mRNA editing or 
splicing . 

Sequence analysis of intracellular RSV mRNAs 
showed that synthesis of each transcript begins at the 
first nucleotide of the gene start signal (74). The 5' 
end of the mRNAs are capped with the structure 
m7G(5 • )ppp(5 ' )Gp (where the underlined G is the first 
template nucleotide of the mRNA) and the mRNAs are 
polyadenylated at their 3 f ends (82). Both of these 
modifications are thought to be made co- 
transcriptionally by the viral polymerase. Three 
regions of the RSV 3 * genomic promoter have been found 
to be important as cis acting elements (83) . These 
regions are the first ten nucleotides (presumably 
acting as a promoter) , nucleotides 21-25, and the gene 
start signal located at nucleotides 45-53 (83) . Unlike 
other Paramyxovirinae, such as measles, Sendai and PIV- 
3, the remainder of the leader and non-coding region of 
NS1 gene of RSV was found to be highly tolerant of 
insertions, deletions and substitutions (83) . 

Additionally, by saturation mutagenesis 
(wherein each base is replaced independently by each of 
the other three bases and compared for translation and 
replication efficiencies) within the first 12 
nucleotides of the 3 1 genomic promoter region, a U- 
tract located at nucleotides 6-10 was shown to be 
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highly inhibitory to substitutions (83) . In contrast, 
the first five nucleotides were relatively tolerant of 
a number of substitutions and two of them at position 
four were up-regulatory mutations, resulting in a four- 
to 2 0- fold increase in RSV-CAT RNA replication and 
transcription. Using a bi-cistronic minireplicon 
system, gene-start and gene-end motifs were shown to be 
signals for mRNA synthesis and appear to be self 
contained and largely independent of the nature of 
adjoining sequence (84) . 

The L gene start signal lies 68 nucleotides 
upstream of the M2 gene-end signal, resulting in gene 
overlap (Fig. 3) (74) . The presence of the M2 gene-end 
signal within the L gene results in a high frequency of 
premature termination of L gene transcripts. Full 
length L mRNA is much less abundant and is made when 
the polymerase fails to recognize the M2 gene-end 
motif . This results in much lower transcription of L 
mRNA. The gene overlap seems incompatible with a model 
of linear sequential transcription. It is not known 
whether the polymerase that exits the M2 gene jumps 
backward to the L gene- start signal or whether there is 
a second, internal promoter for L gene transcription 
(74) . It is also possible that the L gene is 
accessible by a small fraction of polymerases that fail 
to start transcription at the M2 gene- start signal and 
slide down the M2 gene to the L gene-start signal. 

The relative abundance of each RSV mRNA 
decreases with the distance of its gene from the 
promoter, presumably due to polymerase fall-off during 
sequential transcription (80) . Gene overlap is a 
second mechanism that reduces the synthesis of full 
length L mRNA * Also, certain xnRNAs have features that 
might reduce the efficiency of translation. The 
initiation codon for SH mRNA is in a suboptimal Kozak 
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sequence context, while the G ORF begins at the second 
methionyl codon in the mRNA. 

RSV RNA replication is thought (74) to follow 
the model proposed from studies with vesicular 
stomatitis virus and Sendai virus (16,81). This 
involves a switch from the stop- start mode of mJRNA 
synthesis to an antiterminator read- through mode. This 
results in synthesis of positive sense replication- 
intermediate (RI) RNA that is an exact complementary 
copy of genomic RNA. This serves in turn as the 
template for the synthesis of progeny genomes. The 
mechanism involved in the switch to the antiterminator 
mode is proposed to involve cotranscriptional 
encapsidation of the nascent RNA by N protein (16,81). 
RNA replication in RSV like other nonsegmented 
negative -strand RNA viruses is dependent on ongoing 
protein synthesis (85) . Predicted RI RNA has been 
detected for the standard virus as well as RSV-CAT 
minigenome (74,85). RI RNA was 10-20 fold less 
abundant intracellularly than was the progeny genome 
both for the standard and the minigenome system. The 
nucleotide sequences (in positive strand, antigenomic, 
message sense) of various wild- type, vaccine and 
revertant RSV strains, as well as the deduced amino 
acid sequences of the RNA polymerase (L protein) of 
these RSV viruses, are set forth as follows with 
reference to the appropriate SEQ ID NOS . contained 
herein: 
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Virua 

Wild-Type 

2B 

18537 



Nucleotide Sequence L Protein Sequence 



SEQ ID NO: 23 
SEQ ID NO: 25 



SEQ ID NO: 24 
SEQ ID NO: 26 



Vaccine 

2B33F 

2B20L 



SEQ ID NO: 27 
SEQ ID NO: 29 



SEQ ID NO: 28 
SEQ ID NO: 30 



10 Revertant 

2B33F TS(+) 
2B20L TS(+) 



SEQ ID NO: 31 
SEQ ID NO: 33 



SEQ ID NO: 32 
SEQ ID NO: 34 



Each RSV virus genome encodes an L protein 
15 that is 2,166 amino acids long. Genome length and 

other nucleotide information is as follows: 



20 



Virus 

Wild-Type 

2B 

18537 



Genome 
Length 
15218 
15229 



L Start Codon 
8502-8504 
8509-8511 



L Stop Codon 

15000-15002 

15007-15009 



25 



Vaccine 

2B33F 

2B20L 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



30 



35 



Revertant 
2B33F TS(+) 
2B20L TS(+) 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



As detailed in Example 3 (especially Tables 7 
and 8) below, the key attenuating mutations for the RSV 
subgroup B 3 1 genomic promoter region are nucleotide 4 
(C — > G) , and the insertion of an additional A in the 
stretch of A's at nucleotides 6-11 (in antigenomic 
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message sense) * As also detailed in Example 3 below, 
the key potentially attenuating sites for the L protein 
of RSV are as follows: amino acid residues 353 
(arginine -> lysine), 451 (lysine -» arginine) , 1229 
(aspartic acid -> asparagine) , 2029 (threonine — > 
isoleucine) and 2050 (asparagine — » aspartic acid) . It 
is understood that the nucleotide changes responsible 
for these amino acid changes are not limited to those 
set forth in Example 3 below; all changes in 
nucleotides which result in codons which are translated 
into these amino acids are within the scope of this 
invention. 

The attenuated viruses of this invention 
exhibit a substantial reduction of virulence compared 
to wild- type viruses which infect human and animal 
hosts. The extent of attenuation is such that symptoms 
of infection will not arise in most immunized 
individuals, but the virus will retain sufficient 
replication competence to be infectious in and elicit 
the desired immune response profile in the vaccinee. 

The attenuated viruses of this invention may 
be used to formulate a vaccine. To do so, the 
attenuated virus is adjusted to an appropriate 
concentration and formulated with any suitable vaccine 
adjuvant, diluent or carrier. Physiologically 
acceptable media may be used as carriers. These 
include, but are not limited to: an appropriate 
isotonic medium, ptt6si>hate buffered saline and the 
like. Suitable adjuvants include, but are not limited 
to MPL™ (3 -O-deacylated monophosphoryl lipid A; RIBI 
ImmunoChem Research, Inc * , Hamilton, MT) and IL-12 
(Genetics Institute, Cambridge, MA) . 

In one embodiment of this invention, the 
formulation including the attenuated virus is intended 
for use as a vaccine. The attenuated virus may be mixed 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCTYUS97/16718 



- 48 - 



with cryoprotective additives or stabilizers such as 
proteins (e.g., albumin, gelatin), sugars (e.g., 
sucrose, lactose, sorbitol), amino acids (e.g., sodium 
glutamate) , saline, or other protective agents. This 
mixture is maintained in a liquid state, or is then 
dessicated or lyophilized for transport and storage and 
mixed with water immediately prior to administration. 

Formulations comprising the attenuated 
viruses of this invention are useful to immunize a 
human or animal subject to induce protection against 
infection by the wild- type counterpart of the 
attenuated virus. Thus, this invention further 
provides a method of immunizing a subject to induce 
protection against infection by an RNA virus of the 
Order Mononegavirales by administering to the subject 
an effective immunizing amount of a vaccine formulation 
incorporating an attenuated version of that virus as 
described hereinabove. 

A sufficient amount of the vaccine in an 
appropriate number of doses must be administered to the 
subject to elicit an immune response. Persons skilled 
in the art will readily be able to determine such 
amounts and dosages. Administration may be by any 
conventional effective form, such as intranasally , 
parenterally, orally, or topically applied to any 
mucosal surface such as intranasal, oral, eye, vaginal 
or rectal surface, such as by an aerosol spray. The 
preferred means of administration is by intranasal 
administration* 

In another embodiment of this invention, an 
isolated nucleic acid molecule having the complete 
viral nucleotide sequence of either the wild- type 
viruses or vaccine viruses described herein is used to 
generate oligonucleotide probes (from either positive 
strand antigenomic message sense or negative strand 
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complementary genomic sense) and to express peptides 
(from positive strand an ti genomic message sense only) , 
which are used to detect the presence of those wild- 
type virus and/or vaccine strains in samples of body 
fluids and tissues. The nucleotide sequences are used 
to design highly specific and sensitive diagnostic 
tests to detect the presence of the virus in a sample. 

Polymerase chain reaction (PCR) primers are 
synthesized with sequences based on the viral wild- type 
or vaccine sequences described herein* The test sample 
is subjected to reverse transcription of RNA, followed 
by PCR amplification of selected cDNA regions 
corresponding to the nucleotide sequence described 
herein which have nucleotides which are distinct for a 
defined strain of virus. Amplified PCR products are 
identified on gels and their specificity confirmed by 
hybridization with specific nucleotide probes. 

ELISA tests are used to detect the presence 
of antigens of the wild- type or vaccine viral strains. 
Peptides are designed and selected to contain one or 
more distinct residues based on the wild- type or 
vaccine sequences described herein. These peptides are 
then coupled to a hapten (e.g., keyhole limpet 
hemocyanin (KLH) and used to immunize animals (e.g., 
rabbits) for the production of monospecific polyclonal 
antibody. A selection of these polyclonal antibodies, 
or a combination of polyclonal and monoclonal 
antibodies can then be used in a "capture ELISA" to 
detect antigens produced by those viruses. 

Samples of the Moraten measles virus vaccine 
strain were deposited by Applicants with the American 
Type Culture Collection, 12301 Parklawn Drive, 
Rockville, Maryland 20852, U.S.A., under the provisions 
of the Budapest Treaty for the Deposit of 
Microorganisms for the Purposes of Patent Procedures 
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("Budapest Treaty") and have been assigned ATCC 
accession number VR2587. Samples of the HPIV-3 virus 
Vero-grown cp45 vaccine strain were deposited by 
Applicants with the American Type Culture Collection, 
12301 Parklawn Drive, Rockville, Maryland 20852, 
U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2588. 
Samples of the 2B wild- type RSV virus were deposited by 
Applicants with the American Type Culture Collection, 
12301 Parklawn Drive, Rockville, Maryland 20852, 
U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2586. 

Given these three deposited strains and the 
sequence information for these and other strains 
provided herein, one can use site-directed mutagenesis 
and rescue techniques described above to introduce 
mutations (or restore a wild-type genotype) of all the 
strains described herein, as well as taking these 
strains and making additional mutations from the panel 
of mutations set forth in Tables 3, 4 and 6-8 below. 

In order that this invention may be better 
understood, the following examples are set forth. The 
examples are for the purpose of illustration only and 
are not to be construed as limiting the scope of the 
invention. 

Examples 

Standard molecular biology techniques are 
utilized according to the protocols described in 
Sambrook et al . (86). 
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Example 1 
Measles 

Mora ten MV vaccine virus was grown once, 
5 directly from the Attenuvax 1 " vaccine vial (Lot #0716B) , 

the Schwarz vaccine virus was grown once (Lot 
96G04/M179 G41D) , while the Zagreb and Rubeovax™ 
vaccine viruses were each grown twice in the Vero cells 
before RNAs were made for sequence analysis. MV 

10 wildtype isolate Montefiore (56) was passed 5-6 times 

in Vero cells before extraction of RNA materials and 
similarly, MV wildtype isolates 1977, 1983 (14) were 
grown 5-7 times before extracting materials for 
analysis. Edmonston wild-type isolate received from 

15 Dr. J. Beeler (CBER) (see Fig. 1) was the original 

Edmonston isolate already passaged seven times in human 
kidney cells and three times in Vero cells before 
receipt and further passaged once in Vero cells before 
using for sequence analysis. 

20 RNA was prepared by infecting Vero cells at a 

multiplicity of infection (m.o.i.) of 0.1 to 1.0 and 
allowed to reach maximum cytopathology before being 
harvested. Total RNA from measles virus-infected cells 
was extracted using Trizol™ reagent (Gibco-BRL) . 

25 The total RNA isolated from Vero cell passage 

material was amplified by the Reverse Transcriptase-PCR 
(Perkin-Elmer/Cetus) procedure using measles (Edmonston 
B strain (19)) specific primer pairs spanning the 3 1 
and 5 1 promoter regions and the L gene of the viral 

JO genome. Table 2 presents these primer sequences. The 

primers of SEQ ID NOS:35-54, 74, 77 and 78 are in 
antigenomic message sense. The primers of SEQ ID 
NOS:55-73, 75, 76 and 79 are in genomic negative -sense . 
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Table 2 

Primers for PCR and Sequencing MV L Genes 

and Genomic Termini 

5 9047 CATATCACTCACTCTGGGATGGAG 9O70 (SEQ ID NO: 35 

9371 TCAGAACATCAAGCACCGCC 9390 (SEQ ID NO: 36 

9741 ACAGTCAAGACTGAGATGAG 9760 (SEQ ID NO: 37 

10001 AAGAGTCAGATACATGTGGA 10020 (SEQ ID NO: 38 

10351 ACATGAATCAGCCTAAAGTC 10370 (SEQ ID NO: 39 

10 10674 CCGAAAGAGTTCCTGCGTTACGACC 10698 (SEQ ID NO: 40 

110B3 CAGTCCACACAAGTACCAGG lll02 (SEQ ID NO: 41 

lim GTCAGAAGCTGTGGACCATC 11480 (SEQ ID NO: 42 

11841 AATATTGCTACAACAATGGC 11860 (SEQ ID NO: 43 

12l96 ACTCTTCATTCCTAGACTGG 12215 (SEQ ID NO: 44 

15 12542 GTCCAATTATGACTATGAAC 12561 (SEQ ID NO: 45 

12891 AGAACAGACATGAAGCTTGC 12910 (SEQ ID NO: 46 

X3232 CCAACAAGGAATGCTTCTAG 13251 (SEQ ID NO: 47 

1355 1 AC AGC ACT ATC TATGATTGACCTGG x 3 s 7 s (SEQ ID NO: 48 

13930 GCAACATGGTTTACACATGC 13949 (SEQ ID NO: 49 

20 14280 AGATTGAGAGTTGATCCAGG l4299 (SEQ ID NO: 50 

14 62 9 AGGAGATACTTAAAC TAAGC x 4 6 4 fl (SEQ ID NO: 51 

149B1 TAAGCTTATGCCTTTCAGCG 15000 (SEQ ID NO: 52 

X5337 TTAACGGACCTAAGCTGTGC 15356 (SEQ ID NO: 53 

15671 GAAACAGATTATTATGACGG l5€90 (SEQ ID NO: 54 

25 

9390 CGGGCTATCTAGGTGAACTTCAGG 9267 (SEQ ID NO: 55 

9500 ATTTGGATATGGAATATGAG 9491 (SEQ ID NO: 56 

9840 ACTCAACTGAACTACCAGTG 9821 (SEQ ID NO: 57 

10181 AAGAACATCATGTATTTCAG 10162 (SEQ ID NO: 58 

30 l0549 TTATCAACGCACTGCTCATG 10530 (SEQ ID NO: 59 

10919 ATTTTCAGCAATCACTTGGCATGCC 10995 (SEQ ID NO: 60 

11280 GCCTCTGTGCAAACAAGCTG 11261 (SEQ ID NO: 61 

11638 TCTCTAGTTACTCTAGCAGC 11619 (SEQ ID NO: 62 

12010 AGGTCGTTGTTTGTGAGGAG 11991 (SEQ ID NO: 63 

35 12361 TCGTCCTCTTCTTTACTGTC 12342 (SEQ ID NO: 64 
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(SEQ ID NO: 65) 
(SEQ ID NO: 66) 
(SEQ ID NO: 67) 
(SEQ ID NO: 68) 
(SEQ ID NO: 69) 
(SEQ ID NO:70) 
(SEQ ID NO: 71) 
(SEQ ID NO: 72) 
(SEQ ID NO: 73) 

(SEQ ID NO: 74) 

GGGGGATCC 100 ATCCCTAATCCTGCTCTTGTCCC 78 (SEQ ID NO: 75) 

200 GATTCCTCTGATGGCTCCAC 16l (SEQ ID NO: 76) 

1S721 TAACAGTCAAGGAGACCAAAG 15741 (SEQ ID NO: 77) 

GGGAAGCTT 15601 AACCCTAATCCTGCCCTAGGTGG 15823 (SEQ ID NO: 78) 
15894 ACCAGACAAAGCTGGGAATAGA 15873 (SEQ ID NO: 79) 

Overlapping PCR fragments of the complete 
viral genome were directly sequenced without cloning to 
achieve the consensus sequence, by the dideoxy 
terminator cycle sequencing method using both strands 
(ABI PRISM 377 sequencer and ABI PRISM sequencing Kit) . 
To determine the sequence at the absolute termini, a 
ligation procedure described previously was used (55) ♦ 

To test this hypothesis, the nucleotide 
sequences were determined for the non-protein coding 
regulatory regions and the L gene of the progenitor 
Edmonston wild- type MV isolate, for the available 
vaccine strains derived from this isolate, as well as 
for other wild-type strains. Nucleotide (in 
antigenomic, message sense) and amino acid differences 
were then compared and aligned as set forth in Tables 
3-5 (differences are in italics) : 



53 - 



12689 CCGTCCTCGAGCTAGCCTCG U670 
I3052 CTCCTCCAGGCTCACATTGG 130 33 
13420 GGGTTGGTACATAGCTCTGC 13401 
13767 CACCCATCTGATATTTCCCTGATGG 13743 
14099 TGGTTGACAGTACAAATCTG 140eo 
i446 0 CTGAAATGGGAAGATTGTGC 14441 
14820 AGCAATCTACACTGCCTACC 14m 
1S160 TCACAGATGATTCAATTATC 15161 
15530 GATCCTAGATATAAGTTCTC 15511 

1 ACCAAACAAAGTTGGGTAAGG 2 x 
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Table 3 

Differences in MV 3' Genomic Promoter Region 
Nucleotide Sequence 



Virus 

Edmonston w-t 



Nucleotide number: 
26 42 50 96 
A A G G 



Vaccines : 

Rubeovax TI 

Moraten 

Schwarz 

Zagreb 

AIK-C 



T 
T 
T 
T 
T 



C 
C 
C 
T 
C 



G 
G 
G 
G 
G 



G 
G 
G 
A 
G 



Wild-Types: 

1977 

1983 

Montef lore 



A 
A 
A 



A 
A 
A 



A 
A 
A 



G 
G 
G 
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Table 4 

Differences in MV L Nucleotides and Amino Acids 
Between Edmonston Wild-Type and Vaccine Strains 

331 1409 1624 1649 1717 1887 1936 2074 2114 

Edmonston w-t ATT GCA ACC AGG GAT AAC CAT CAA AGA 
Mutation ACT ACA GCC ATG GCT GAC TAT CGA AAA 
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Example 2 
PIV-3 

A comparison of sequences (in antigenomic 
5 message sense) of the parental wild- type JS strain of 

PIV-3 virus and the FRhL-grown and Vero-grown forms of 
the cp45 mutant are set forth in Table 6, Where a 
codon change does not result in an amino acid change, 
Table 6 states "none", followed by the name of the 
10 unchanged amino acid. 
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Sequence analysis of the parental wild- type 
JS strain of PIV-3 virus and the FRhL-grown cp45 mutant 
showed that the latter contained 20 nucleotide changes. 
Four changes were in the noncoding 3 1 -leader region at 
5 nucleotide positions 23 (T -> C) , 24 (C — > T) , 28 (G 

T) and 45 (T -* A) (in antigenomic , message sense). 
When considered in the genomic, negative sense, the 
change at position 2 8 from the smaller pyrimidine ( W C") 
to the larger purine ("A") may change the size of the 
10 region flanked by the conserved regions of the 3* 

genomic promoter region, resulting in an altered 
spatial presentation of the cis -acting signals to the 
polymerase . 

Nine changes were coding changes in the NP, 

15 M, F, HN and L genes* The other seven changes were 

non-coding or silent changes in the NP, P, F, HN and L 
genes or the NP untranslated region (UTR) . The cp45 
mutant has been demonstrated to have poor transcription 
activity at non- permissive temperatures due to its ts 

20 phenotype (87) . This ts phenotype has now been mapped 

to the viral L gene (88) . Because the cp45 virus has 
been shown to function normally with regard to 
mutations in the HN and F glycoproteins (87) , this 
supports the implication that mutations in the 3'- 

25 leader and L gene contributed to the attenuating 

phenotype of this virus* 

Thus, the four 3* leader specific changes in 
FRhL-grown cp45 and the three coding changes in the L 
gene at amino acid positions 942 (Tyr — ► His) , 992 (Leu 

30 ->Phe) and 1558 (Thr He) contributed significantly 

to the attenuation phenotype of the candidate cp45 
vaccine strain. 

Furthermore, the Vero- grown cp45 mutant 
vaccine strain contains an additional mutation 

35 resulting from a coding change in the L gene (marked 



BNSDOCID:<WO 981 3501 A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 61 - 

with an asterisk in Table 6) at amino acid residue 1292 
(leucine phenylalanine) . 

The first two amino acid changes in the L 
protein (at positions 942 and 992) map to one of the 
highly conserved areas among all Paramyxovirus L genes. 
The fourth amino acid change (at position 155 8) maps to 
the area joining two conserved blocks corresponding to 
the change at amino acid 1717 in the MV vaccine 
strains . 

The published literature (89) sets forth only 
18 changes between the antigenomic message sense 
sequences of the JS and FRhL-grown cp45 strains. 
Sixteen of these changes were found by applicants. 

The published literature did not report four 
changes found by applicants: in the 3' leader at 
nucleotide 45 (T A), in the NP UTR at nucleotide 62 
(A T) , or the changes in amino acids in the NP 
protein resulting from the changes at nucleotide 3 97 (T 
-> C) , leading to the amino acid change (Val -^Ala) and 
nucleotide 1275 (T -+ G) , leading to the amino acid 
change (Ser -+Ala) (nucleotide changes in antigenomic, 
message sense) . Nor did the published literature 
report the additional potentially attenuating mutation 
in the L protein found by applicants in the Vero-grown 
cp45 strain resulting from the change at nucleotide 
12521 (A — » T) , leading to the change in amino acid 
12 92 (Leu -» Phe) . 
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Example 3 
RSV Subgroup B 

The temperature -sensitive (ts) phenotype is 
strongly associated with attenuation in vivo; in 
addition, some non-ts mutations may also be 
attenuating. Identification of ts and non-ts 
attenuating mutations was achieved by sequence analysis 
and evaluation of ts , cold-adapted (ca) , and in vivo 
growth phenotypes of RSV mutants and revertants. 

The genomes of the following five RSV 2B 
strains have now been completely sequenced: 2B parent, 
2B33F, one revertant designated 2B33F TS(+), 2B20L and 
one revertant designated 2B20L TS(+). The 2B33F and 
2B20L strains are ts and ca and are described in U.S. 
Serial No. 08/059, 444 (90) , which is hereby 
incorporated by reference. After identifying regions 
where mutations in 2B33F and 2B20L are located, nine 
additional isolates of 2B33F " revertant s" obtained 
following in vitro passaging at 3 9°C and in vivo 
passaging in African Green Monkeys or chimpanzees, and 
nine additional isolates of 2B20L "revertants" obtained 
following in vitro passaging at 3 9°C have been 
sequenced in those regions. The ts , ca, and 
attenuation phenotypes of many of these revertants have 
now been characterized and assessed. Correlations 
between phenotype ts, vaccine attenuation and sequence 
changes have been identified. 

A summary of results is presented in Tables 

7-12. 
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Table 7 

Sequence comparison between RSV 2B and 2B33F strains 
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4525 


T 


c 


c 


Ile-Thr (75) 




4526 


T 


c 


c 


Ile-Thr (75) 




4542 


T 


c 


c 


Stop-Gin (81) 




4561 


T 


c 


c 


Leu-Pro (87) 




4575 


T 


c 


c 


Trp-Arg (92) 




4598 


T 


c 


c 


none Thr (99) 


L 


9559 


G 


A 


A 


Arg-Lys (353) 




9853* 


A 


G 


A 


Lys-Arg (451)* 




12186 


G 


A 


A 


Asp-Asn (1229) 




14587 


C 


T 


T 


Thr-Ile (2029) 




15071 


A 


G 


G 


non- coding 



t For 2B33F and 2B33F TS(+), nucl. pos. numbers 

are one larger than for 2B for M, SH & L genes 

* At pos. 9853, the Lys-Arg change has reverted 

back to Lys in the 2B33F TS( + ) strain 
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Table 8 

Sequence comparison between RSV 2B and 2B2 0L strains 





Nucl. 




Nucleotide changes 






pos . t 












1 Gene/ 


3 1 end 


RSV 


2B 


RSV 


RSV 2B20L 


Amino acid 


1 region 


of vRNA 






2B20L 


TS(+) , Rl 


changes 












revertant 




I Genomic 


4 


C 




G 


G 


non- coding* 


1 Promoter 


6 






extra A 


extra A 


non- coding* 


II L 


8963 


c 




T 


T 


none Thr (154) 




13347 


A 




A 


G 


Asn-Asp (1616) 




14587 


C 




T 


T 


Thr-Ile(2029)* 




14649 


A 




G 


G 


Asn-Asp (2050) 




14650 


A 




A 


T 


Asn-Asp-Val 














(2050)** 



t For 2B20L and 2B20L TS(+), nucl- pos. numbers 

are one larger than for 2B for L gene 
* Mutation is common in 2B33F and 2B20L strains 

** At pos. 14650, the mutation suppresses the ts 

phenotype in 2B20L TS(+) revertant 
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Table 10 
2B33F Revertants 





ts (+) In vitro 


AGM 


Chimp | 


5a 4a 3b 


pp2 pp4 pp6 pp7 


1A 3A 5A 


base no.t 








M 

4176,4200 


S S S 


S S S S 


S S S 


SH 

14 bases* 


S S S 


S S S S 


S S S 


L 
9560 
9854 
12187 
14588 
15072 


S S S 
2B 2B 2B 
S S S 
S S S 

s s s 


S S S S 
2B S S S 
S S S S 
S S S S 
S S S S 


S S S 
ND 2B 2B 

s s s U 

ND S S II 

s s s I 


Phenotype 








ta 
ca 

Attenuated 


2B 2B 2B 

S S S 
r r r 


r r S S 
2B S 2B S 
<r) (r) S S 


2B 2B 2B 
ND ND ND 
ND r r 


t These 2B33F revertant base noe ♦ are one larger 


than for 2B for M 



SH and L genes 

* bases 4330,4410,4421,4443,4455,4485,4498,4506,4526,4527,4543, 

4562,4576,4599 
S = same base as 2B33F 

2B c reversion to 2B base or complete reversion in phenotype 
r - moderate reversion in phenotype 
(r) = slight reversion in phenotype 
ND a not done 
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Table 11 
2B20L Revertants 











TS( + ) 


In vitro Isolates 






base no . t 


Rl 


R2 


R3A 


R4A 


R5A 


R6A 


R7A 


R8A 


R9A 


R10A 


L 






















8964 


S 


S 


S 


S 


s 


S 


S 


S 


S 


S 


13348 


C* 


S 


ND 


s 


s 


ND 


S 


S 


S 


S 


145B8 


S 


S 


S 


s 


s 


S 


s 


S 


S 


s 


14650 


s 


S 


2B 


s 


2B 


2B 


s 


S 


2B 


2B 


14651 


A* 


A* 


S 


A* 


S 


S 


A* 


A* 


S 


S 


Phenotype 






















ts 


2B 


2B 


ND 


ND 


ND 


ND 


ND 


ND 


2B 


2B 


Attenuated 


r 


r 


ND 


ND 


ND 


ND 


ND 


ND 


r 


r 



t These 2B20L revertant base nos. are one larger than for 2B for L 
genes 

S = same base as 2B20L 

2B - reversion to 2B base 

r = moderate reversion in phenotype 

* = base change, different from 2B or 2B20L 

ND « not done 
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Table 12 

RSV 2B, ts and Revertant Strains: Phenotype Summary 



I virus Isolate 


Source 


In Vitro 
Phenotype 
ts ca 


In Vivo 
Attenuation 
Cotton AGM 
Rat 


RSV 2B 


Wild- type Parent Strain 










RSV 2B33F 


ca., ta mutant isolated 
from 2B, cold-passaged 
x 33 




+++ + 


+ + + 


RSV 2B33P - 5a 
TS( + ) 


2B33F spinner passage 
plaque picked at 39°C 




++ 


++ 


+ 


RSV 2B33F - 4a 
TS< + ) 


2B33F spinner passage 
plaque picked at 39°C 




++ 


++ 


ND 


RSV 2B33F - 3b 
TS( + ) 


2B33F spinner passage 
plaque picked at 39°C 


- 


++ 


++ 


ND 


AGM pp2 


2B33F-in£ected AGM A2 , 
d7 nasal wash plaque 
picked at 32°C 






+ + + 


ND 


AGM pp4 


2B33F- infected AGM A2, 
d7 nasal wash plaque 
picked at 32°C 


+ 


++ 


++ + 


ND 


AGM pp6 


2B33F-inf ected AGM A4 , 
dl2 nasal wash plaque 


+++ + 




+++ + 


ND 


AGM pp7 


2B33F-inf acted AGM A4 , 
dl2 nasal wash plaque 
picked at 32°C 


+++ + 


++ 


+++ + 


ND 


Chimp pplA 


2B33F-inf ected chimp 
#1552, d4 tracheal 
lavage, plaque picked 
at 3 2°C 




NT) 


ND 


ND 


Chimp pp3A 


2B33F-inf ected chimp 
#1560, d6 tracheal 
lavage, plaque picked 
at 32°C 




ND 




ND 


Chimp pp5A 


2B33F-inf ected chimp 
#1563, dlO tracheal 
lavage, plaque picked 
at 32°C 




ND 


+ + 


ND 
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Table 12 (continued) 
RSV 2B, ts and Revertant Strains: Phenotype Summary 



Virus Isolate 


Source 


In Vitro 
Phenotype 
ts ca 


In Vivo 
Attenuation 
Cotton AGM 
Rat 


RSV 2B20L 


ca, ts mutant isolated 
from 2B, cold-passaged 
x 20 


++++ 


++ 


+++ + 


++++ 


RSV 2B2 0L Rl 
TS( + ) 


2B2 0L spinner passage 
plaque picked at 39°C 




ND 


++ 


ND 


RSV 2B20L R2 
TS U) 


2B20L spinner passage 
plaque picked at 3 9°C 




ND 


++ 


ND 


RSV 2B20L R9 
TS{ + ) 


2B2 0L spinner passage 
plaque picked at 3 9°C 




ND 


++ 


ND 


RSV 2B20L RIO 
TS( + ) 


2B2 0L spinner passage 
plaque picked at 39°C 




ND 


♦ + 


ND 



ND = not done 

- = wild-type phenotype, i.e., not temperature sensitive, not cold 

adapted, not attenuated 
+ to ++■ + + = increasing levels of temperature sensitivity, cold- 
adaptation or attenuation 
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Several significant observations can be drawn 
from these data: 

a. As shown in Tables 7 (for 2B33F) and 8 (for 
2B20L) , there are relatively few sequence changes 
identified in the two mutant strains: RSV 2B33F 
differs from parental RSV 2B by two changes at the 3 1 
genomic promoter region, two changes at the non -coding 
5'-end of the M gene, and four coding changes plus one 
non-coding (poly (A) motif) change in the RNA dependent 
RNA polymerase coding L gene. In addition, 14 changes 
mapped to the SH gene alone. RSV 2B20L differs from 
its RSV 2B parent only at seven nucleotide positions, 
of which three are common with 2B3 3F virus, including 
two changes at the 3 1 genomic promoter and one coding 
change in the L gene. Two additional unique changes of 
2B20L virus mapped to the coding region of the L gene. 
Potentially attenuating mutations at the non- coding 3 1 
genomic promoter region and the RNA dependent RNA 
polymerase gene have been identified. 

b. Two ts mutations can be identified in the L 
gene of the attenuated virus strains 2B33F and 2B20L: 

(i) In 2B33F, a mutation at nucleotide position 

9853 (A -» G) leading to a coding change in L protein 
at amino acid 451 (Lys -»Arg) is clearly associated 
with the ts and attenuation phenotypes. Reversion at 
this site alone in the 2B33F TS(+) 5a strain is 
responsible for complete restoration of growth at 39°C 
(Table 9) and partial reversion in attenuation in 
animals. This association with the ts and attenuation 
phenotypes was also supported by partial sequence 
analyses of six additional "full TS revertants" 
(designated 4a, 3b, pp2 , 3A, 5a, 5A) isolated from cell 
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culture and from chimps, in which only the nucleotide 
9853 mutation reverted (Tables 10-12) (note that one 
AGM (African Green Monkey) isolate which reverted at 
9853 only partially reverted in ts phenotype) . This 
amino acid 451 mutation (Lys -» Arg) is amenable to 
stabilization in cDNA infectious clone constructs, by 
inserting a second mutation to stabilize the codon, 
thereby lessening the likelihood that it will revert 
back to Lys . 

(ii) In 2B20L, a mutation at base 14,649 (A G) 

leading to a coding change in the L protein (amino acid 
position 2,050, Asn ->Asp) appears to be associated 
with the ts and attenuation pheno types . This aspartic 
acid at the amino acid 2050 invariably reverts back 

(Asp Asn) in TS( + ) revertants or changes to a 
different amino acid (Asp Val) by nucleotide 
substitution at position 14,650 (A -> T) (Tables 8, 
11) . The above observation is based on complete 
sequence analysis on the TS(+) revertant Rl and partial 
sequence of several additional TS(+) revertants (R2, 
R4A, R7A, R8A) at selected regions (Table 11) . An 
additional mutation is seen in the Rl revertant at 
nucleotide postion 13,347 (amino acid 1616, Asn 
Asp) associated with the above reversion. However, the 
effect of this mutation on the ts phenotype is not 
known; the L gene of other revertants has not been 
sequenced completely. 

c - Three base changes are common to 2B33F and 

2B20L strains of virus: 

(i) A change at position 14,587 (C — » T) with a 

corresponding change (Thr lie) at amino acid 2029 is 
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present in both 2B33P and 2B20L (Tables 7,8). This 
nucleotide "T" substitution was found to be present in 
10% of the population of the progenitor RSV2B strain 
and may have been preferred during the attenuation 
5 process. No wildtype base "C" was found in the 2B33F 

and 2B20L virus. 

(ii) Two mutations are seen in the 2B33F and 2B20L 

3» genomic promoter region: nucleotide 4 (C — >> G) and 

10 the insertion of an extra A in the stretch of A's at 

positions 6-11 (in antigenomic, message sense) . When 
the sequences of selected TS(+) revertants were 
analyzed, these mutations were seen to have been 
retained in the 2B33F TS(+)5a (Table 7) and the 2B20L 

15 TS(+)R1 (Table 8) revertants. These non-coding, cis- 

acting mutations remained associated with partial viral 
attenuation. 

Expression using the minireplicon RSV-CAT 
system for the analysis of these cis-acting changes has 

20 shown the 3 1 genomic promoter nucleotide 4 (C -> G) 

change to be an upregulation of 

transcription/ replication in this in vitro system when 
the 2B progenitor virus or either of the 2B33F or 2B33F 
TS{ + ) provided helper L gene functions (the N, P and M2 

25 genes are identical in these viruses) . 

Complementation analysis of the 2B33F 3' 
genomic promoter and the helper functions provided by 
the progenitor RSV2B virus or the 2B33F and 2B33F TS(+) 
viruses by this RSV-CAT minireplicon system has also 

30 been conducted. All three viruses supported both the 

2B and 2B33F 3* genomic promoter mediated 
transcription/replication functions. However, the 
2B33F and 2B33F TS(+) viruses preferred their 2B33F 3' 
genomic promoters. This analysis clearly shows co- 

35 evolution of 3 1 genomic promoter changes during the 
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vaccine attenuation process, along with the RNA 
dependent RNA polymerase gene. Reversion of ts 
phenotype in the 2B33F mutant 5a by reversion of the 
single L protein amino acid 451 (Arg ->Lys) by 
sequence analysis was clearly demonstrated by support 
of transcription/replication functions of RSV-CAT 
minireplicon at 37°C. The 2B33F virus did not provide 
helper functions to the RSV-CAT minireplicon (with 2B 
or 2B33F 3' genomic promoters) at 37°C. 

d. A biased hypermutation of SH seen in 2B33F is 

present in all 2B33F revertants, regardless of 
phenotype, and is not seen in 2B2 0L, which is ts, ca, 
and attenuated. Thus, there are no data at this time 
that associate this mutation with any biological 
phenotype. 

Another wild- type RSV designated 18537 was 
also sequenced and compared to the sequence of the 
wild- type RSV 2B strain. With one exception, at all 
the critical residues described above, the two wild- 
type strains were identical. For 2B, the codon ACA at 
nucleotides 14586-14588 encodes a Thr at amino acid 
2029 of the L protein, while for 18537, the codon ATT 
at nucleotides 14593-14595 encodes an lie at amino acid 
2029 (the L gene start codon is at nucleotides 8509- 
8511 in 18537, compared to 8502-8504 in 2B) . 

Example 4 
PCR Assay to Detect Measles Virus 

A 21 year old patient was admitted to a 
hospital with a three week history of progressive non- 
productive cough, shortness of breath, and fever. His 
symptoms failed to improve following treatment with 
clarithromycin for seven days or after a similar course 
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of treatment with atovaquone . Concomitant complaints 
of right upper quadrant abdominal pain proved 
recalciltrant to omeprazole and antacids. Relevant 
past medical history included Factor VIII deficiency 
and HIV infection diagnosed 3-4 years prior to this 
hospital admission. One year earlier, he had received 
a booster immunization of measles -mumps -rubella (MMR) 
vaccine as required for college enrollment. 

Bronchoalveolar lavage and transbronchial 
biopsies performed two days after admission to the 
hospital demonstrated reactive hyperplasia and alveolar 
lining cell desquamation with minimal chronic 
inflammation. No microorganisms were revealed by Gram, 
methenamine silver, or PAS stains. CT scans of the 
chest showed multiple, ill-defined, confluent nodules 
at the left lung base. Despite administration of 
empiric antimicrobials for opportunistic bacterial, 
mycobacterial, and fungal pathogens commonly 
responsible for pulmonary complications of advanced HIV 
disease, the patient became and remained febrile to 

3 9°C. A left- sided pleural effusion developed; 
diagnostic thoracentesis showed it to be exudative but 
otherwise non-diagnostic. Bronchoalveolar lavage 
performed three weeks later only demonstrated alveolar 
histiocytes, some of which were hemosiderin laden, a 
few lymphocytes, and neutrophils. FITE, AFB, and 
methanamine silver stains again were negative. 

Two weeks thereafter, a wedge resection of 
the left lung was performed through CT- guided 
minithoracotomy . Multiple tissue sections revealed 
nodular areas of acute and chronic inflammation with 
regions of necrosis and fibrosis. Numerous 
multinuclated giant cells were present, some of which 
contained both intracytoplasmic and intranuclear 
inclusions suggestive of measles virus giant cell 
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pneumonia. Special stains for bacteria, fungi, P. 
carinii, and acid fast organisms again gave negative 
results. Electron microscopic examination of sections 
of this lung biopsy revealed particles morphologically 
5 consistent with paramyxoviruses such as measles virus. 

Serum anti-measles IgM titers determined by a solid 
phase hemadsorbant assay were negative, as was a 
subsequent IgM capture immunoassay* 

Two weeks later, Rhesus monkey kidney (RMK) 

10 tissue culture cells inoculated with the patient's lung 

biopsy material revealed cytopathic changes 
characteristic of measles virus infection. 
Confirmation was obtained using an immunofluorescence 
assay with monoclonal antibodies directed to measles 

15 virus. Based upon this diagnosis, oral ribavirin 

lOOOmg B.I.D. was given for 14 days. Unfortunately, 
the patient progressively deteriorated, eventually 
dying two months later. 

In order to ascertain the nature of the 

20 measles virus present in the patient, reverse 

transcription and PCR amplification of virus obtained 
from infected tissues were performed, followed by 
sequence analysis. The measles virus isolated from 
Rhesus monkey kidney cells inoculated with tissue from 

25 this patient's lung biopsy was propagated by two serial 

passages in the continuous Vero (monkey kidney) tissue 
culture cell line. Total infected cell RNA was 
extracted at the second Vero cell passage using TRIzol 
reagent (Life Technologies, Grand Island, NY) according 

30 to the manufacturer's protocol. Total RNA was 

similarly extracted from the patient's lung biopsy 
material. The measles virus vaccine strain (Moraten) 
currently used in the United States as a component of 
the trivalent MMR vaccines, was obtained in its 

35 univalent form (Attenuvax™, Merck, Sharpe, & Dohme) . 
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This virus was passaged once in Vero cells and total 
vaccine infected cellular RNA then was extracted as 
described above. 

Each of these RNA preparations was reverse 
5 transcribed (RT) to cDNA using random hexameric primers 

and Maloney murine leukemia virus reverse transcriptase 
(Perkin-Elmer/Cetus RT-PCR kit reagents, Perkin-Elmer- 
Cetus f Branchburg, NJ) . The cDNA then was amplified by 
PGR using measles virus-specific oligodeoxynucleotide 

10 primer pairs whose design was based on the Edmonston 

measles virus sequence described above. These PCR 
products comprised a set of overlapping DNA fragments 
spanning the entire 15,894 nucleotide long measles 
genome. A consensus genomic sequence was established 

15 by direct analysis of each PCR product, without 

cloning, using the dideoxy terminator cycle- sequencing 
method established by the manufacturer (ABI PRISM 377 
sequencer and ABI PRISM DNA sequencing kit; Perkin- 
Elmer/Cetus, Foster City, CA) . Both strands of the 

20 PCR-amplif ied DNA products were analyzed to eliminate 

possible sequencing ambiguities. 

The nucleotide sequences of selected regions 
of the measles virus genomes present in the patient 1 s 
viral isolate, as well as in the diseased lung tissue, 

25 were compared with that of the Moraten vaccine virus, 

as well as with the nucleotide sequences of other 
measles virus wild- type and vaccine strains. This 
sequence analysis revealed identity to the Moraten 
vaccine strain rather than demonstrating relatedness to 

30 past or currently circulating wild- type viruses or 

other measles vaccine strains. 
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Example 5 



EL ISA to Detect RSV 



5 



10 



An EL ISA test is used to detect the presence 
of RSV. Peptides are designed and selected based on 
homologies to the RSV sequences described herein to be 
specific for all subgroup B strains, or for individual 
wild- type, vaccine or revertant RSV subgroup B strains 
described herein. These peptides are then coupled to 
KLH and used to immunize rabbits for the production of 
monospecific polyclonal antibody. A selection of these 
polyclonal antibodies, or a combination of polyclonal 
and monoclonal antibodies is then used in a "capture 
ELISA" to detect the presence of an RSV antigen. 
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SEQUENCE LISTING 



<1) GENERAL INFORMATION: 



(i) 



APPLICANT: Udem, Stephen A. 

Sidhu, Mohinderjit S. 



Tat em, Joanne H. 
Murphy, Brian R. 



Randolph, Valerie B. 



(ii) TITLE OF INVENTION: 3* Genomic Promoter Region and 

Polymerase Gene Mutations Responsible for Attenuation in 
Viruses of the Order Designated Mononegavirales 

(iii) NUMBER OF SEQUENCES: 79 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: American Home Products Corporation 

(B) STREET: One Campus Drive 

(C) CITY: Parsippany 

(D) STATE: New Jersey 

(E) COUNTRY: United States 

(F) ZIP: 07054 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

{vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Gordon, Alan M. 

(B) REGISTRATION NUMBER: 30,637 

(C) REFERENCE / DOCKET NUMBER: 33,294 PCT 

(ix) TELECOMMUNICATION INFORMATION: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



(A) TELEPHONE: 973/683-2157 

(B) TELEFAX: 973/683-4117 
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(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: RKA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



ACCAAACAAA 


GTTGGGTAAG 


GATAGATCAA 


X \~T%JX X <JA X 


XnX lLlAvlu 


UVL X 1 AisuAl 


o U 


TCAAGATCCT 


ATTATCAGGG 


ACAAGAGCAG 


GATTAGGGAT 




X X X 




TAAGGAGCTT 


AGCATTGTTC 


AAAAGAAACA 


AGGACAAACC 


JtrrriTTara 

**V.\^\_f\X X j\s~~t\ 


X UUibiA IVfUu 


i an 


GTGGAGCCAT 


CAGAGGAATC 


AAACACATTA 


TTATAGTACC 


AAI V-1mI_> X WJA 


uAl X X WLA 




TTACCACTCG 


ATCCAGACTT 


CTGGACCGGT 


A U\J X \v#\VW X X 


4u\ X x uKmuiaI. 




1 Art 


GCGGGCCCAA 


ACTAACAGGG 


GCACTAATAG 


GTATATTATC 


C*TT A TTTftTfi 

\— X iAi. 1 iUlU 


fJirTrrrnr 

UT/lv X V_ X l*V*Jlv 


JDU 


GTCAATTGAT 


TCAGAGGATC 


ACCGATGACC 


CTGACGTTAG 


CATAAGGCTG 


T'I'A£IAf2f2*P*Pr2 

X X A\J/\U\J 1 1V9 


42 0 


TCCAGAGTGA 


CCAGTCACAA 


TCTGGCCTT A 


CCTTCGCATC 


AAGAGGTACC 


AACATGGAGG 


480 


ATGAGGCGGA 


CCAATACTTT 


TCACATGATG 


ATCCAATTAG 


T AGTG A Tr* A A 


x V— \— /vvjva x x wvv 


540 


GATGGTTCGA 


GAACAAGGAA 


ATCTCAGATA 


TTGAAGTGCA 


AG JLCCCTCZ a a 


nnATTPAAr'A 


60 0 


TGATTCTGGG 


T AC CAT C CT A 


GCCCAAATTT 


GGGTCTTGCT 


CGCAAAGGCG 


GTTACGGCCC 


660 


CAGACACGGC 


AGCTGATTCG 


GAGCTAAGAA 


GGTGGATAAA 


GTACACCCAA 


CAAAGAAGGG 


720 


TAGTTGGTGA 


ATTTAGATTG 


GAGAGAAAAT 


GGTTGGATGT 


GGTGAGGAAC 


AGGATTGCCG 


780 


AGGACCTCTC 


CTTACGCCGA 


TTCATGGTCG 


CTCTAATCCT 


GGATATCAAG 


AGAACACCCG 


84 0 


GAAACAAACC 


CAGGATTGCT 


GAAATGATAT 


GTGACATTGA 


TACATATATC 


GTAGAGGCAG 


900 


GATTAGCCAG 


XTTTATCCTG 


ACTATTAAGT 


TTGGGATAGA 


AACTATGTAT 


CCTGCTCTTG 


960 


GACTGCATGA 


ATTTGCTGGT 


GAGTTATCCA 


CACTTGAGTC 


CTTGATGAAC 


CTTTACCAGC 


1020 


AAATGGGGGA 


AACTGCACCC 


TACATGGTAA 


TCCTGGAGAA 


CTCAATTCAG 


AACAAGTTCA 


1080 


GTGCAGGATC 


ATACCCTCTG 


CTCTGGAGCT 


ATGCCATGGG 


AGTAGGAGTG 


GAACT TGAAA 


1140 


ACTCCATGGG 


AQGTTTGAAC 


TTTGGCCGAT 


CTTACTTTGA 


TCCAGCATAT 


TTTAGATTAG 


1200 


GGCAAGAGAT 


GGTAAGGAGG 


TCAGCTGGAA 


AGGTCAGTTC 


CAC ATT G G CA 


TCTGAACTCG 


1260 


GTATCACTGC 


CGAGGATGCA 


AGGCTTGTTT 


CAGAGATTGC 


AATGCATACT 


ACTGAGGACA 


1320 
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AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGT T AT AA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT C CCACGATTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGCCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTC CAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAG CGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 



1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
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AGCATATCCA CCCTGGAAGG ACACCTCTCA AG CATCATG A TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGAT GCTG A TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTT CG A CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAG GATG A ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGGC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATT ATT G CAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AGATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AAT CGATT AC 
TCTGGAGGAG CAGAT G CAA G ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTAC G AC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 



2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID:<WO 981 3501 A2> 



WO 98/13501 



PCT/US97/16718 



- 90 - 



TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATC CGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4 860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 

CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGTTC 4 980 

CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AAC CATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 52 80 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 534 0 

CTCCTCCTTT TCT CGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAGACACCC 5 520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG T CAT AAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGT TTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 5940 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT C CAAG ACT AC 6000 
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ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA G CT CG GAT AC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAG GGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TT AT CTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA AT CAAG ACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTCTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
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AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 7620 

CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 7680 

TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 7740 

GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 7800 

CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 7 860 

ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 7 920 

ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 7980 

TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 8040 

AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 8100 

GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGQGAGC TCAAACTCGC 8160 

AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 8220 

CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 8460 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 8520 

GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 8580 

CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC AACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCA(£ 9120 
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ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
AC CTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
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AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCAC AGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 11100 

GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 11160 

ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 1122 0 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 11340 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 11400 

ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 1152 0 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 
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TTGTATGTGT 


CCAGAGCATC 


ACTAGACTCC 


TCAAGAACAT 


AACTGCAAGG 


TTTGTCCTGA 


12300 


TCCATAGTCC 


AAACCCAATG 


TTAAAAGGAT 


TATTCCATGA 


TGACAGTAAA 


GAAGAGGACG 


12360 


AGGGACTGGC 


GGCATTCCTC 


ATGGACAGGC 


ATATTATAGT 


ACCTAGGGCA 


GCTCATGAAA 


12420 


TCCTGGATCA 


TAGTGTCACA 


GGGGCAAGAG 


AGTCTATTGC 


AGGCATGCTG 


GATACCACAA 


12480 


AAGGCCTGAT 


TCGAGCCAGC 


ATGAGGAAGG 


GGGGGTTAAC 


CTCTCGAGTG 


ATAACCAGAT 


12540 


TGTCCAATTA 


TGACTATGAA 


CAATTCAGAG 


CAGGGATGGT 


GCTATTGACA 


GGAAGAAAGA 


12600 


GAAATGTCCT 


CATTGACAAA 


GAGTCATGTT 


CAGTGCAGCT 


GGCGAGAGCT 


CTAAGAAGCC 


12660 


ATATGTGGGC 


GAGGCTAGCT 


CGAGGACGGC 


CTATTTACGG 


CCTTGAGGTC 


C CTGATGT AC 


12720 


TAGAATCTAT 


GCGAGGCCAC 


CTTATTCGGC 


GTCATGAGAC 


ATGTGTCATC 


TGCGAGTGTG 


12780 


GATCAGTCAA 


CTACGGATGG 


TTTTTTGTCC 


CCTCGGGTTG 


CCAACTGGAT 


GATATTGACA 


12840 


AGGAAACATC 


ATCCTTGAGA 


GTCCCATATA 


TTGGTTCTAC 


CACTGATGAG 


AGAACAGACA 


12900 


TGAAGCTTGC 


CTTCGTAAGA 


GCCCCAAGTC 


GATCCTTGCG 


ATCTGCTGTT 


AGAATAGCAA 


12960 


CAGTGTACTC 


ATGGGCTTAC 


GGTGATGATG 


ATAGCTCTTG 


GAACGAAGCC 


TGGTTGTTGG 


13020 


CT AG GCAAAG 


GGCCAATGTG 


AGCCTGGAGG 


AG CTAAGGGT 


GATCACTCCC 


ATCTCAACTT 


13080 


CGACTAATTT 


AGCGCATAGG 


TTGAGGGATC 


GTAGCACTCA 


AGTGAAATAC 


TCAGGTACAT 


13140 


CCCTTGTCCG 


AGTGGCGAGG 


T ATA CCACAA 


TCTCCAACGA 


CAATCTCTCA 


TTTGTCATAT 


13200 


CAGATAAGAA 


GGTTGATACT 


AACTTTATAT 


ACCAACAAGG 


AATGCTTCTA 


GGGTTGGGTG 


13260 


TTTTAGAAAC 


ATTGT TTCG A 


CTCGAGAAAG 


ATACCGGATC 


ATCTAACACG 


GTATTACATC 


13320 


TTrarnrrna 




TGCGTGATCC 


CGATGATAGA 


TCATCCCAGG 


ATACCCAGCT 


13380 


CCCGCAAGCT 


AGAGCTGAGG 


GCAGAGCTAT 


GTACCAACCC 


ATTGATATAT 


GATAATGCAC 


13440 


CTTTAATTGA 


CAGAGATGCA 


ACAAGGCTAT 


ACACCCAGAG 


CCATAGGAGG 


CACCTTGTGG 


13500 


AATTTGTTAC 


ATGGTCCACA 


CCCCAACTAT 


ATCACATTTT 


AGCTAAGTCC 


ACAGCACTAT 


13560 


CTATGATTGA 


CCTGGTAACA 


AAATTTGAGA 


AGGACCATAT 


GAATGAAATT 


TCAGCTCTCA 


13620 


TAGGGGATGA 


CGATATCAAT 


AGTTTCATAA 


CTGAGTTTCT 


GCTCATAGAG 


CCAAGATTAT 


13680 


TCACTATCTA 


CTTGGGCCAG 


TGTGCGGCCA 


TCAATTGGGC 


ATTTGATGTA 


CATTATCATA 


13740 


GACCATCAGG 


GAAATATCAG 


ATGGGTGAGC 


TGTTGTCATC 


GTTCCTTTCT 


AGAATGAGCA 


13800 
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AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCT CTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGAT C AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC AT CTCAAAT A TGAGCATCAA GGATTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AAT CGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GT AG GT AAT A 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTT AT CC AT T C AG AT AT AG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TAC TGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAAC TTACAC 
CT AT AG AG CA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAA& 
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AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 1560 0 

ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA A CGTG AGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15780 

ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala He Leu Glu Tyr Ala 



20 



25 



30 



Arg 



Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 



He 



Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 



Asn 
65 



Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
70 75 80 



Ala 



His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID:<WO 961 3501 A2> 



3 

WO 98/13501 



3 



PCT/US97/1671S 



- 98 - 



lie Glu Asp Lye Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

lie Lye Glu Lys Val lie Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val lie Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu lie Ser Arg Asp 
195 200 205 

Leu Val Ala lie lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val lie Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr lie Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu lie Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin lie Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285' 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr Hie Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
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370 375 380 

Leu Met Lys Gly His Ala lie Phe Cys Gly He lie He Aan Gly Tyr 
385 390 395 400 

Arg Aap Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

Hio Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
^65 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Ar 9 **eu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 
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Ser Ala Phe lie Thr Thr Asp Leu Lye Lya Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu lie Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His lie 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin lie Phe lie Lys Tyr Pro Met 
725 730 735 

Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu Hie Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val lie Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 



BNSDOCID:<WO 98135G1A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 101 - 



Arg Mot Ala Leu Leu Pro Ala Pro lie Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Aan He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Aen Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala Hie Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
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1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Aan Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 

1285 1290 1295 o 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Aan Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 
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Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp lie Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 
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Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu lie Thr Tyr Lys Glu lie Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn lie Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe lie Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp lie Glu Thr Leu Pro Asn Lys 
1875 1880 1885 



Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala lie Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys lie Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys L y° Glu Leu l le H * B H ^ B Ab P 
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2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lye 
2130 2135 2140 

Ser Glu Lye Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO : 3 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTC 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCGGGA GATTCCTGAA 240 
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TTACCACTCG 


ATCTAGACTT 


CTGGACCGGT 


TGGTCAGGTT 


AATTGGAAAC 


CCGGATGTGA 


300 


GCGGGCCCAA 


ACTAACAGGG 


GCACTAATAG 


GTATATTATC 


CTTATTTGTG 


GAGTCTCCAG 


360 


GTCAATTGAT 


TCAGAGGATC 


ACCGATGACC 


CTGACGTTAG 


CATAAGGCTG 


TTAGAGGTTG 


420 


TCCAGAGTGA 


CCAGTCACAA 


TCTGGCCTTA 


CCTTCGCATC 


AAGAGGTACC 


AACATGGAGG 


480 


ATGAGGCGGA 


CCAATATTTT 


TCACATGATG 


ATCCAAGTAG 


TAGTGATCAA 


TCCAGGTTCG 


540 


GATGGTTCGA 


GAACAAGGAA 


ATCTCAGATA 


TTGAAGTGCA 


AGACCCTGAG 


GGATTCAACA 


600 


TGATTCTGGG 


TACCATCCTA 


GCTCAAATTT 


GGGTCTTGCT 


CGCAAAGGCG 


GTTACGGCCC 


660 


CAGACACGGC 


AGCTGATTCG 


GAGCTAAGAA 


GGTGGATAAA 


GTACACCCAA 


CAAAGAAGGG 


720 


TAGTTGGTGA 


ATTTAGATTG 


GAGAGAAAAT 


GGTTGGATGT 


GGTGAGGAAC 


AGGATTGCCG 


780 


AGGACCTCTC 


CTTACGCCGA 


TTCATGGTCG 


CTCTAATCCT 


GGATATCAAG 


AGAACACCCG 


840 


GGAACAAACC 


CAGGATTGCT 


GAAATGATAT 


GTGACATTGA 


TACATATATC 


GTAGAGGCAG 


900 


GATTAGCCAG 


TTTTATCCTG 


ACTATTAAGT 


TTGGGATAGA 


AACTATGTAT 


CCTGCTCTTG 


960 


GACTGCATGA 


ATTTGCTGGT 


GAGTTATCCA 


CACTTGAGTC 


CTTGATGAAT 


CTTTACCAGC 


1020 


AAATGGGGGA 


AACTGCACCA 


TACATGGTAA 


TCCTGGAGAA 


CTCAATTCAG 


AACAAGTTCA 


1080 


GTGCAGGATC 


ATACCCTCTG 


CTCTGGAGCT 


ATGCCATGGG 


AGTAGGAGTG 


GAACTTGAAA 


1140 


ACTCCATGGG 


AGGTTTGAAC 


TTTGGCCGAT 


CTTACTTCGA 


TCCAGCATAT 


TTCAGACTAG 


1200 


GGCAAGAGAT 


GGTGAGGAGG 


TCAGCTGGAA 


AGGTCAGTTC 


CACATTGGCA 


TCTGAACTCG 


1260 


GTATCACTGC 


CGAAGATGCA 


AGGCTTGTTT 


CAGAGATCGC 


AATGCATACT 


ACAGAGGACA 


1320 


GGATCAGTAG 


AGCGGTTGGA 


CCCAGACAAT 


CCCAAGTGTC 


ATTCCTACAC 


GGTGATCAAA 


1380 


ATGAAAATGA 


GCTACCGAGA 


TGGGGGGGTA 


AGGAAGATAT 


GAGGGTCAAA 


CAGAGTCGGG 


1440 


GAGAAGCCAG 


AGAGAGCTAC 


AGAGAAACCA 


GGCCCAGCAG 


AGCAAGTGAC 


GCGAGAGCTA 


1500 


CCCATCCTCC 


AACCGACACA 


CCCTTAGACA 


TTGACACTGC 


ATCGGAGTCC 


AGCCAAGATC 


1560 


CGCAGGACAG 


TCGAAGGTCA 


GCTGACGCCC 


TGCTCAGGCT 


GCAAGCCATG 


GCAGGAATCT 


1620 


CGGAAGAACA 


AGGCTCAGAC 


ACGGACACCC 


CTAGAGTGTA 


CAATGACAGA 


GATCTTCTAG 


1680 


ACTAGGTGCA 


AGAGGCCGAG 


GACCAGAACA 


ACATCCGCCT 


ACCCTCCATC 


ATTGTTATAA 


1740 


AAAACTTAGG 


AACCAGGTCC 


ACACAGCCGC 


CAGCCCACCA 


ACCATCCACT 


CCCACGATTG 


1800 
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GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGT CAGAA 1920 

ATATCAGACA ACCCAGGACA GGAGCGAGCC GCCTGCAAGG AAGAGAAGGC AAGCAGTCCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GATCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAGGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 22 80 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCAAA GCTTAGGAAA ACTCTCAATG TTCCCCCGCC CCCGGACCCT 24 60 

GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CAT C AGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 2 640 

GCCGTACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 27 00 

AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCACCAA GCTAGAATCA 2820 

CTGCTGTTAT TGAAGGGGGA AGTTGAGTCA ATCAAGAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCTTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 3000 

GGCAGAGATT CAGGCCGAGC ACTGGCTGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

ATCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 3120 

CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCGGA CACCGGCCCT 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGACATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 3300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG- 3360 
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CCAGTCGACC TAGCTAATAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 3540 

TCAGAGTCAT AGATCCTGGT CTAGGCGACA GAAAAGATGA ATGTTTTATG TACATGTTTC 3 600 

TGCTGGGGGT TGTTGAGGAC AGCGATCTCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 3660 

CTCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 37 80 

ACAACACCCC ACTAACTCTC CTCATACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 3 840 

TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 3900 

TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCAGATAA CGGGTATTAC ACCGTTCCTA 3 960 

GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 4020 

GGATTGACAA GGCGATTGGC CATGGGAAGA TCATCGACAA TGCAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA AAGTGAAGTC TACTCTGCCG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAGAA GACCCTATGT TACCCACTGA TGGATAT CAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 4380 

AAGAATTCCG CATTTACGAC GACGTTATCA TAAATGATGA CCAAGGATTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA GACGACCCTC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGAAAA AAAGGCCCCC TCCGAAAGAC TCCACAGACC AAATGAGAGG CCAGCCAGCA 4560 

GCTGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CAT AAGGC CA 4 620 

CCACCAGCCA TCCCAATCTG CATCCTCCTC GTAGGACCCC CGAGGACCAA CCCCCAAGGT 4680 

TGCCCCCCAC CCAAACCACC AACCGCATCC CTACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ACTGGAAGAG CCCTTCCCCT TTCCCTCAAC ACAAGAACTC CACAACCGAA CCACACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCACCCGA CTCCCTAGAC AGATCCTCTC CCCCTGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 
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CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG CACACCAACC CCCGAACAGA CCCAGCACCC AGCCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGACAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCTGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGCAGCCACC CAACCCTAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCTTCCTCT TCTCGAAGGG ACTAAAAGAT CAATCCACCA CATCCGACGA CACTCAACTC 5400 

CCCGTCCCTA AAGGAGACAC CGGGAATCCC GGAATTAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

A CCGGTCAAA TCCATTGGGG CAATCTCTCT AAGAT AGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 57 00 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGTTT 57 60 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCAG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAG C CAT CG A CAATCTGAGG GCAAGTCTGG AAACTACTAA TCAGGCAATT 5940 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTCGGGCTCA AATTGGTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCTAGCTTA 6120 

CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT CGGAGGAGAT 6180 

ATCAATAAGG TGTTAGAAAA GCTCGGATAT AGTGG AGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 6300 

AGTATAGCCT ACCCGACGCT GTCCGAGATC AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 63 60 

GTCTCGTACA, AC AT AGG CT C TCAAGAGTGG TATACGACTG TGCCCAAGTA TGTTGCAACC 6420 

CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AG AGGGG ACT 6480 
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GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 6540 

TCCACCAAGT CCTGTGCTCG TACACTCGTA TCTGGGTCTT TTGGGAACCG GTTCATTTTG 66 00 

TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 6660 

ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 6720 

GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 6780 

TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 684 0 

AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 6900 

CAGATATTGA GGAGTATGAA AGGTTTGTCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 6960 

GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 7 020 

AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 7080 

ACATCGAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGGAAC ACAAATGTCC 7140 

CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGCC CACCTGAAAT 7200 

TATCTCCGGC TCCCCTTTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 7260 

ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 7320 

TCCCAAGGGA AGTAGGATAG TTATCAACAG AGAACACCTT ATGATTGATA GACCTTATGT 7 380 

TTTGCTGGCT GTTCTGTTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CAATTGCAGG 7440 

CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 7500 

TCTAGATGTA ACTAACTCAA TTGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 7560 

AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 7 620 

CATCTCTGAC AAGATTAAAT TCCTTAACCC GGATAGGGAG TACGACTTCA GAGATCTCAC 7680 

TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 7740 

GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 7800 

CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 7860 

ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 7920 

ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 7980 

TAATCTGAGC AGCAAAGGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 8040 
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AGGTGTTATC 


AGAAATCCGG 


GTTTGGGGGC 


TCCGGTGTTC 


CATATGACAA 


ACTATTTTGA 


8100 


GCAACCAGTC 


AGTAATGATC 


TCAGCAACTG 


TATGGTGGCT 


TTGGGGGAGC 


1 TCAAACTCGC 


8160 


AGCCCTTTGT 


CACGGGGGAG 


ATTCTATCAC 


AATTCCCTAT 


CAGGGATCAG 


GGAAAGGTGT 


8220 


CAGCTTTCAG 


CTCGTCAAGC 


TAGGTGTCTG 


GAAATCCCCA 


ACCGACATGC 


AATCCTGGGT 


8280 


CCCCTTCTCA 


ACGGATGACC 


CAGTGATAGA 


CAGGCTTTAC 


CTCTCATCTC 


ACAGAGGTGT 


8340 


TATCGCTGAC 


AATCAAGCAA 


AATGGG CTAT 


CCCGACAACA 


AGAACAGATG 


ACAAGTTGCG 


8400 


AATGGAGACA 


TGCTTCCAGC 


AGGCGTGTAA 


GGGTAAAATC 


CAAGCACTCT 


GCGAGAATCC 


8460 


CGAGTGGGCA 


CCATTGAAGG 


ATAACAGGAT 


TCCTTCATAC 


GGAGTCTTGT 


CTGTTGATCT 


8520 


GAGTCTAACA 


GTTGAGCTTA 


AAATCAAAAT 


TGCTTCGGGA 


TTCGGGCCAT 


TGATCACACA 


8580 


CGGTTCAGGG 


ATGGACCTAT 


ACAAGTCCAA 


CCACAACAAT 


GAGTATTGGC 


TGACTATCCC 


8640 


GCCAATGAAG 


AACCTAGCCC 


TAGGTGTAAT 


CAACACATTG 


GAGTGGATAC 


CGAGATTCAA 


8700 


GGTTAGTCCC 


AACCTCTTCA 


CTGTCCCAAT 


TAAGGAAGCA 


GGCGAAGACT 


GCCATGCCCC 


8760 


AACATACCTA 


CCTGCGGAGG 


TGGATGGTGA 


TGTCAAACTC 


AGTTC CAATC 


TGGTGATCCT 


8820 


ACCTGGTCAA 


GATCTCCAAT 


ATGTTTTGGC 


AACCTACGAT 


ACTTCCAGGG 


TTGAACATGC 


8880 


TGTGGTTTAT 


TACGTTTACA 


GCCCAAGCCG 


CTCATTTTCT 


TACTTTTATC 


CTTTTAGGTT 


8940 


GCCTATAAAG 


GGGATCCCCA 


TCGAATTACA 


AGTGGAATGC 


TTCACATGGG 


ACCAAAAACT 


9000 


CTGGTGCCGT 


CACTTCTGTG 


TGCTTGCGGA 


CTCAGAATCT 


GGTGGACATA 


TCACTCACTC 


9060 


TGGGATGGTG 


GGCATGGGAG 


TCAGCTGCAC 


AGTCACCCGG 


GAAGATGGAA 


CCAATAGCAG 


9120 


ATAGGGCTGC 


CAGTGAACCA 


ATCACATGAT 


GTCACCCAGA 


CAT CAGGCAT 


ACCCACTAGT 


9180 


GTGAAATAGA 


CATCAGAATT 


AAGAAAAACG 


TAGGGTCCAA 


GTGGTTCCCC 


GTTATGGACT 


9240 


CGCTATCTGT 


CAACCAGATC 


TTATACCCCG 


AAGTTCACCT 


AGATAGCCCG 


ATAGTTACCA 


9300 


ACAAGATAGT 


AGCCATCCTG 


GAGTATGCTC 


GAGTCCCTCA 


CGCTTACAGC 


CTGGAGGACC 


9360 


CTACACTGTG 


TCAGAACATC 


AAGCACCGCC 


TAAAAAACGG 


ATTTTCCAAC 


CAAATGATTA 


9420 


TAAACAATGT 


GGAAGTTGGG 


AATGTCATCA 


AGTCCAAGCT 


TAGGAGTTAT 


CCGGCCCACT 


9480 


CTCATATTCC 


ATATCCAAAC 


TGTAATCAGG 


ATTTATTTAA 


CATAGAAGAC 


AAAGAGTCAA 


9540 


CGAGGAAGAT 


CCGTGAACTC 


CTCAAAAAGG 


GAAATTCGCT 


GTACTCTAAA 


GTCAGTAATA - 


9600 
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AGGTTTTCCA ATGCTTGAGG GACACTAATT CACGGCTTGG TCTAGGCTCC GAATTGAGGG 
AGG ACAT C AA G G AG AAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG C CAT AGG AGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TAACATTTGA GCTGGTTTTG ATGTATTGTG ATG T CAT AG A GGGGAGGTTA ATGACAGAGA 
CTGCTATGAC CATTGATGCT AGATATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AATTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACGGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCACGAG TTAGTTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACACCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GC CAT AT TTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC CTCAGGTGAA GGATTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTT CATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCC CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATA TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCTAAAGAT CTCAAAGAAA 
GTCACAGAGG GGGGCCAGTC CTAAAAACCT ACTCCCGAAG CCCAGCCCAC AC AAA T AC C A 
GGAACGTGAG GGCAGCAAAA GGGTTTATAG GGTTCCCTCA GATAATTCGG CAGGACCAAG 
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ACACTAATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACAACTG 11220 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAGACCT 1134 0 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 11400 

GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCATTA GTGCAAGGGG ACAATCAGAC CATAGCTGTA ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCTTAC AACCTTAAGA AATGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT T CTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAAT GAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTTTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC T AAAAGC AT C GAGAGAGGTT 11880 

ATGAC CGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GAtACAGCAG ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCAGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCATCACTGA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAGCAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGTGT CC AG AG CATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 12300 

TCCACAGTCC AAACCCAATG TTAAAGGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC AGCATTCCTC ATGGACAGGC AT AT T AT AGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 12480 

AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTAACA GGAAGAAAGA 12 600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCC CTAAGAAGCC 12 660 

ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 
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TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 12780 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 12960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13020 

CAAGGCAAAG GGCTAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13080 

CGACTAATTT AGCACATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 13140 

CCCTTGTCCG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCT C CT A GGGTTGGGCG 13260 

TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 1332 0 

TTCACGTCGA AACAGATTGT TGCGTGATCC CAATGATAGA TCATCCCAGG ATACCCAGCT 13380 

CTCGCAAGCT AGAGCTGAGG GCAGAGCTGT GTACCAACCC ATTGATATAT GATAATGCAC 13440 

CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTAG 13500 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC ACAGCACTAT 13560 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 13620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 13680 

TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 13860 

GGCACTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 13920 

CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTTACA TTTCTTTTGT GTGAAAGTGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 14100 

GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 14160 

ATATCAAGGC GGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 14220 

TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTTCGGCG AGGATCGATC AAACAGATAA 14280 
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6ATTGA6A6T TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCT CAAATA TGAGCATCAA GGATTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ATACAAGCAA GCACAATCTT CCCATTTCTG 
GGGGCAATCT CGCCAATTAT GAAAT CCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTATT CTTGGGTGAG GGATCGGGTT C TAT GTTGAT CACTTATAAG GAG AT AC TT A 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TCTCTGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACATG GGTAGGCAGT GTAGATTGCT 
TCAATTACAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TAT AC CCCAG ATACAGCAAC TT CAT ATCT A CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TAT C CATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAGGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGAATC ACTCGCAAAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGTTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATT AT GACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAG GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCCAGG TGGTTAGGCA TTATTTGTAA 
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TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 
I (C> STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met lie lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asn Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 
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Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Val Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Aan Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 
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Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lye Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 5B5 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Ala His Thr Asn Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin He He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asn His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lye Leu Trp Thr He Ser Thr He 
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740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg lie Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr lie Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Trp Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val lie Leu Arg Gin Arg Leu His Asp lie Gly His 
805 810 815 

His Leu Lys Ala Abh Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lye Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lye Val 
885 890 895 

He Gin Gin lie Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Gin Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 
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Ser Pro Aon Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His lie He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

lie Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Gly Thr Ser Lou Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met lie Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lye Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 



Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lye Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
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1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Aop Arg Phe Asp Aan lie Gin Ala Lya His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro lie Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His lie Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn lie Asn Pro lie lie Val 
1650 1655 1660 

Asp Hie Tyr Ser Cyo Ser Leu Thr Tyr Leu Arg Arg Gly Ser lie Lye 
1665 1670 1675 1680 

Gin lie Arg Leu Arg Val Asp Pro Gly Phe lie Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys lie Gly Ser Asn Asn lie Ser Asn 
1700 1705 1710 

Met Ser lie Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp lie Asn Thr Ser Lys His Asn Leu Pro lie Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu lie His Ala Phe Arg Arg lie Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Aep Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu lie Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn lie Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 
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Val Gly Ser Val Asp Cys Phe Asn Tyr lie Val Ser Asn lie Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe lie His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala lie Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 I960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser lie Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 
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Leu He Leu Asp Leu His Gin Asn He Phe Val Lys.Aen Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lye Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNKSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 12 0 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 30 0 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTGTTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAAGTAG TAGTGATCAA TCCAGGTCCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATTCTA GCCCAAATTT GGGTCTTGCT CGCGAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 72 0 

TAGTTGGTGA ATTCAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 7 80 
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AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGGACACCCG 84 0 

GGAACAAACC AAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCCTA ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 1020 

AAATGGGAGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 1080 

GTGCAGGATC ATACCCCCTG CTCTGGAGCT ATGCCATGGG AGTAGGGGTG GAACTTGAAA 1140 

ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 1200 

GGCAAGAGAT GGTGAGGAGG TCAGCTGGGA AAGTCAGTTC CACATTAGCA TCTGAACTCG 1260 

GTATCACTGC TGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCACACT ACTGAGGACA 1320 

GGACCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTGTC ATTTCTACAC GGTGATCAAA 1380 

GTGAGAATGA GCTACCAGGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGGG 144 0 

GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGTCTAGCAG AGCAAGCGAT GCGAGAGCTG 1500 

CCCATCTTCC AAC CAGCGCA CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 1560 

CGCAGGACAG TCGACGGTCA GCTGACGCCC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 1620 

TGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTGTA CAATGACAGA GATCTTCTAG 1680 

ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 1740 

AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCTACGACTG 1800 

GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCC CATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCACAA 1920 

ATATCAGACA ACCCAGGACA GGACCGAACC ACCCGCAAGG AAGAGGAGGC AGGCAGTTCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCAGTGC ACCTCGCATC 2040 

TGCGGTCAGG GATCTGGAGA GAGCGATGAC AACGCTGAAA CTTTGGGAAT CCCCTCAAGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATCATG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 2280 

GATAC CGAGG GATATGCTAT CACTG AC CGG GGATCTGCTC CCATC TCT AT GGGGTTCAGG 2340 
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GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCT 2400 

AGAGGCAACA ACTTCCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 2460 

GGTAGGGCCA GCACTTCCGA GACACC CATT AAAAAGGGGA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAGCGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GTGGGGAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 282 0 

CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAAAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCAT CATG A TCGCCATTCC TGGACTTGGG 294 0 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACT CAAT C CCGACCTGAA ACC CAT CAT A 3000 

GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAA 3120 

CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCC 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 3 300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCTCATG 3360 

CCAATCGACC TAATTAGTAC AGCCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 354 0 

TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTACG TACATGTTTC 3600 

TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 3660 

CCCTGCCCTT AGGTGTTGGT AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTCGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 3780 

ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 3840 

TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 3900 
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TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 3960 

GAAGAATGCT AGAATTCAGA TCGGTCAATG CAGTGGCTTT CAACCTGCTG GTGACCCTTA 4020 

GGATTGACAA AGCGATTGGC CCTGGGAAGA TCATCGATAA TGCAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCTG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAAAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCCC 43 80 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATACCCGAA AACGACCCCC CT CAT AATG A CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCAAAAGAC TCCACGGACC AAGTGAGAGG CCAGCCAGCA 4560 

GCTGACGGCA AGCGTGAACA CCAGGCGGCC TGGGCACAGA ACAGCCCCGA CACAAGGCAA 4620 

CCACCAGCCA TCCCAATCTG CGTCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGT 4680 

CGCCCCCGAC CCAGACCACC AACCGCATCC CCACAGCCCC CGGGAAAGAG ACCCCCAGCA 4740 

ACTGGAAGGC CCCTCCCCCT TTCCCTCAAC GCAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GATCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC CCCCCGGCAA 4 860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCGAC AGAACCCAGA CCCCGGCCCA 4920 

CGGCGCCGCG CCCCCACCTC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG CACACCAACC CTCGAACAGA CCCAGCACCC AGC CAT CG AC 5040 

AATTCAAGAC GGGGGGCCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CAGGAACCGA ACCAGAATCC AGACCACCCT 5160 

GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCTGCCC TGATCCGGTG GGCGGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGGCCC CCGAACCGCA AAAGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCCCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAATTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 
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GGTCTCAAGG 


TGAACGTCTC 


TGCC AT ATT C 


ATGGCAGTAC 


TGTTAACTCT 


CCAAACACCC 


5520 


ACCGGTCAAA 


TCCATTGGGG 


CAATCTCTCT 


AAGATAGGGG 


TGGTAGGGAT 


AGGAAGTGCA 


5580 


AGCTACAAAG 


TTATGACTCG 


TTCCAGCCAT 


CAATCATTAG 


TCATAAAATT 


AATGCCCAAT 


5640 


ATAACTCTCC 


TCAATAACTG 


CACGAGGGTA 


GAGATTGCAG 


AATACAGGAG 


ACTACTGAGA 


5700 


ACAGTTTTGG 


AAC CAATTAG 


AGATGCACTT 


AATGCAATGA 


CCCAGAATAT 


AAGACCGGTT 


5760 


CAGAGTGTAG 


CTTCAAGTAG 


GAGACACAAG 


AGATTTGCTG 


GAGTTGTCCT 


GGCGGGTGCG 


5820 


GCCCTAGGCG 


TTGCCACAGC 


TGCTCAGATA 


ACAGCCGGCA 


TTGCACTTCA 


CCAGTCCATG 


5880 


TTGAACTCTC 


AAGCCATCGA 


CAATCTGAGA 


GCGAGCCTGG 


AAACTACTAA 


TCAGGCAATT 


5940 


GAGGCAATCA 


GACAAGCA6G 


GCAGGAGATG 


ATATTGGCTG 


TTCAGGGTGT 


CCAAGACTAC 


6000 


ATCAATAATG 


AGCTGAT AC C 


GTCTATGAAC 


CAACTATCTT 


GTGATTTAAT 


CGGCCAGAAG 


6060 


CTAGGGCTCA 


AATTGCTCAG 


ATACTATACA 


GAAATCCTGT 


CACTATTTGG 


CCCCAGCTTA 


6120 


CGGGACCGCA 


TATCTGCGGA 


GATATCTATC 


CAGGCTTTGA 


GCTATGCGCT 


TGGAGGAGAT 


6180 


ATCAATAAGG 


TGTTAGAAAA 


GCTCGGATAC 


AGTGGAGGTG 


ATTTACTGGG 


CATCTTAGAG 


6240 


AGCAGAGGAA 


TAAAGGC CCG 


GATAACTCAC 


GTCGACACAG 


AGTCCTACTT 


CATTGTACTC 


6300 


AGTATAGCCT 


ATCCGACGCT 


GTCCGAGATT 


AAGGGGGTGA 


TTGTCCACCG 


GCTAGAAGGG 


6360 


GTCTCGTACA 


ACATAGGCTC 


TCAAGAGTGG 


TATACCACTG 


TGCCCAAGTA 


TGTTGCAACC 


6420 


CAAGGGTACC 


TTAT CTCGAA 


TTTTGATGAG 


TCATCGTGTA 


CTTTCATGCC 


AGAGGGGACT 

1 


6480 


GTGTGCAGCC 


AAAATGCCTT 


GTACCCGATG 


AGTCCTCTGC 


TCCAAGAATG 


CCTCCGGGGG 


6540 


TCCACCAAGT 


CCTGTGCTT C G 


TAClAfTrTClTA 

XX>AJ\. 




TTGGGAACCG 


GTTCATTTTA 


6600 


TCACAAGGGA 


ATCTAATAGC 


CAATTGTGCA 


TCAATCCTTT 


GCAAGTGTTA 


CACAACAGGA 


6660 


ACGATCATTA 


ATCAGGACCC 


TGACAAGATC 


CTAACATACA 


TTGCTGCCGA 


TCACTGCCCG 


6720 


GTGGTCGAGG 


TGAACGGCGT 


GACCATCCAA 


GTCGGGAGCA 


GGCGGTATCC 


GGACGCTGTG 


6780 


TACTTGCACA 


GAATTGACCT 


CGGTCCTCCC 


ATATCATTGG 


AGAGGTTGGA 


CGTAGGGACA 


6840 


AATCTGGGGA 


ATGCAATTGC 


TAAGTTGGAG 


GATGCCAAGG 


AATTGTTGGA 


GTCATCGGAC 


6900 


CAGATATTGA 


GGAGTATGAA 


AGGTTTATCG 


AG CACTAGCA 


TAGTTTACAT 


CCTGATTGCA 


6960 


GTGTGTCTTG 


GAGGGTTGAT 


AGGGATCCCC 


GCTTTAATAT 


GTTGCTGCAG 


GGGGCGTTGT 


7020 
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AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATQTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCGAGCC CACCTGAAAT 
TGTCTCCGGA TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAGGGGA AGTAGGATAG TTATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATAAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
GATCATCGGT GATGAAGTGG GCCTGAGGAC AC CTCAGAGA TTCACCGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAACTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGG CCAGGGTAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AATCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGTAAAGGGT CAGAGTTGTC ACAACTGAGC ATGCACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 
GCAACCAGTC AGTAATGATT TCAGCAACTG CATGGTGGCT TTGGGGGAGC TCAAATTCGC 
AGCCCTTTGT CACAGG GAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCCTATCA ACGGATGATC CAGTGATAGA CAOGCTCTAC CTCTCATCTC ACAGAGGCGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGGACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTAATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCAGGA TTCGGGCCAT TGATCACACA 
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CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTTCCAAT TAAGGAAGCA GGCGAGGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTATGAT ACTTCCAGAG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAGG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGATATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAACCGCAG 9120 

ATAGGGCTGC CAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTT CACCT AGATAGCCCG ATAGTT AC CA 9300 

ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCATACAGC CTGGAGGACC 9360 

CTAC ACT GTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 942 0 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGACCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CAAGGAAGAT CCGTGAGCTC CTCAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG C CAT AGGAGG AGACACACAC CAGTATTCTT C ACT GGT AGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACGTTTGA ACTGGTCTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC CATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGT CAG A TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTACCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT T ACCTGC AG C TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 
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CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 102 00 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CCCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10380 

AGACTCTGAT GAAAGGTCAT GCCATATTCT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 105 00 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 

ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 10680 

AGTTCCTGCG TTACGACCCT CCCAAAGGAA CTGGGTCACG GAGGCTTGTA AATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGACA TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AATTAT TTT A AGGACAATGG GATGGCCAAG GACGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCACAGGGG GGGGCCAGTC TTAAAAACCC ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 1110 0 

AGAAC GTGAG AGCAGCAAAA GGG TTT AT AG GATTCCCTCA TGTAATTCGG CAGGACCAAG 11160 

AC ACTG AT C A TCCGGAGAAT ATGGAGGCTT ACGAGACAGT CAGT GCATTT ATCACGACTG 11220 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGAC CAT CAGCTTATTT GCACAAAGGC 11280 

TAAATGAGAT TTACGGATTA CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAAACCT 1134 0 

CTGTCCTCTA TGTAAGTGAC CCTCATTGCC CCCCTGACCT TGACGCCCAT GTCCCGTTAT 11400 

GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CAT AG C CGT A ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG AC AT AG G C CA TCACCTCAAG GCAAATGAGA 11700 
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CAATTGTCTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 117 60 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATCCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGATCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATCGGGGGG ATGAATTATC 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC TCATCACTAA TGCCTGAAGA GACCCTTCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGCGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 12300 

TCCATAGTCC AAAC CCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 1242 0 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 12480 

AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTC CAA TT A TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAGAGCC CTAAGAAGCC 12660 

ATATGTGGGC AAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 

TAGAATCTAT GCGAGGCCAC CTTATTCGGC GCCATGAGAC ATGTGTCATC TGCGAGTGTG 12780 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12 900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AG AAT AG CAA 12960 

CAGTGTACTC ATGGGCTTAT GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13 020 

CAAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13 080 

CGACTAATTT AGCGCATAGG TTGAGGGATC GTACCACTCA AGTGAAATAC TCAGGTACAT 13140 

CCCTTGTCCG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAGGG AATGCTTCTA GGGTTGGGTG 13260 
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TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 13320 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 133 80 
CCCGCAAGCT AGAGCTTAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 13440 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 13500 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 13560 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGG AC CAT AT GAATGAAATT TCAGCTCTCA 13620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 13 680 

TCACTATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 13 860 

GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTACACA 13920 

CAACTGTGTG CAACATGATT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTCACA TTTCTTCTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA TATCCAGGCA AAACACTTGT GTGTTCTAGC AGATTTGTAC TGTCAACCAG 14100 

GGACCTGCCC ACCAATTCGA GGTCTACGAC CTGTAGAGAA ATGTGCAGTT CTAACCGATC 14160 

ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGGTCTTC GTGGAACATA AATCCAATTA 14220 

TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 14280 

GATTGAGAGT TGATCCAGGA TTCATTTTTG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 14340 

CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 1440 0 

ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 14460 

GGGGTAATCT CGCCAATTAT GAAATCCACG CTTTCCGCAG AATCGGGTTA AACTCATCCG 1452 0 

CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 14580 

ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 14640 

AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 14700 

AATTAGCACC CTATCCCTCC GAAGTTGGTC TTGTCGAACA CAGAATGGGA GTAGGTAATA 14760 

TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 14820 



BNSDOCID: <WO 981 3501 A2> 



SUBSTITUTE SHEET (RULE 26) 



'3 

PCT/US97/16718 
- 134 - 

TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTAGTGT6GG GTTTATCCAT TCAGATATAG 14B80 

AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTAGCAGCC ATCTTATCGA 14940 

TGGCTCTGCT CCTTGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 15000 

GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTTA TTATAGAGAA GTGAACCTTG 15060 

TCTACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTCATG ACAGATCTCA 15120 

AAGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 15180 

GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 15240 

CAATTGTGGG AGACGCAGTT AGTAGAGGTG GTATCAACCC TATTCTGAAG AAACTTACAC 15300 

CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 15360 

AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCATG 15460 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

TTTGGGGGCA TATTCTTCTT TACTCCGGGA A CAGAAAGTT GATAAATCGG TTTATCCAGA 15600 

ATCTCAAGTC CGGTTACCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15560 

CTAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAATCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15760 

ATTAATTGGT TGGACTCCGG GACCCTAATC CTGCCCTAGG TAGTTAGGCA TTATTTGCAA 15840 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
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Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Thr His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu, Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys He He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
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275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Aen Hie Cys Phe Thr Glu He His Asp Val Leu Asp Gin Aan Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu lie Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu Hie 
405 410 415 

Ala Ala Asp Thr He Arg Aen Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

Hie Glu Gin Cys Val Asp Aen Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lye Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lye Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asn Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Aen Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lye Met Thr Tyr Lye Met Arg Ala Cye Gin Val He Ala 
545 550 555 560 
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Glu Asn Leu lie Ser Asn Gly lie Gly Asn Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr His Ser Arg Ser Pro Val His Thr Ser Thr Lys Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro His Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Ser Lye Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser lie Ala Arg Cys Val Phe Trp Ser Glu Thr lie Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn lie Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Aen Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Aen Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ser Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
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1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu lie Asp Lye Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser Hie Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro lie Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cye Val He Cye Glu Cys Gly Ser 
1170 1175 HBO 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg lie Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Thr Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cya Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 
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Cys Thr Aen Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Aap 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser Hia Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr Hie He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met lie Asp Leu Val Thr Lye Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin CyB Ala Ala He Asn Trp Ala Phe Asp Val His Tyr Hie Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser Hie Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

He Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Aap Val Val 
15 *5 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr CyB Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 
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Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser lie Lye 
1665 1670 1675 1680 

Gin lie Arg Leu Arg Val Asp Pro Gly Phe lie Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys lie Gly Ser Asn Asn lie Ser Asn 
1700 1705 1710 

Met Ser lie Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp lie Asn Thr Ser Lys His Asn Leu Pro lie Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu lie His Ala Phe Arg Arg lie Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu lie Thr Tyr Lys Glu lie Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn lie Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe lie Val Ser Asn lie Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr lie Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Tyr 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr ser Asn Phe He Ser 
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1940 1945 1950 

Thr Qlu Ser Tyr Leu Val Met Thr Asp Leu Lye Ala Aan Arg Leu Met 
1955 1960 1965 

Aan Pro Glu Lys lie Lye Gin Gin lie lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu lie Gly His lie Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Gly lie Aen Pro 
2005 2010 2015 

He Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Aen Gly Pro Lys Leu Cys Lys Glu Leu He Hie His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lye Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lye Val Thr He Lye Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 
TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 
TGAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATTCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 
TTCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGAT G ATCCAAGCAG TAGTGATCAA TCCAGGTCCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGATCCTGAG GGATT CAACA 
TGATTCTGGG TACCATTCTA GCCCAGATCT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC TTTACGCCGA TTCATGGTGG CTCTAATCCT GGATATCAAG AGGACACCCG 
GGAACAAACC TAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCTTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 
AAATGGGAGA AACTGCACCC TACATGGTAA TCCTAGAGAA CTCAATTCAG AACAAGTTCA 
GCGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTGAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCCGAACTCG 
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GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 1320 

GGATCAGTAG AGCGGTCGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 1380 

GTGAGAATGA GCTACCAGGA TTGGGGGGCA AGGAAGACAG GAGGGTCAAA CAGAGTCGGG 1440 

GAGAAGCCAG GGAGAGCTAC AGAGAAACCG AGTCCAGCAG AGCAAGTGAT GCGAGAGCTG 1500 

CCCATCCTCC AACCAGCATG CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 1560 

CGCAGGACAG TCGAAGGTCA GCTGACGCTC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 1620 

TGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTATA CAATGACAGA GATCTTCTAG 1680 

ATTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 1740 

AAAACTTAGG AAC CAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCCACGACTG 1800 

GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCAGAA 1920 

ATATCAGACA ATCCAGGACA GGACCGAGCC GCCTGCAAGG AAGAGGAGGC AGGCAGTTCG 1980 

GGTCTCAGCA AACCATGCTT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GATCTGGAGA AAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAAGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTAT CATG TTTATGATCA CAGCGG TGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 2280 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGAT CCACG AGCTCCTGAA ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 2460 

AGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGGA CAGACGCGAG ATTGGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CGTCAGGGCC AGATGCACCT GCGGG GAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 2820 
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TTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGC AT CATGA TTGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACTCAATC CCGACCTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AGCCCGTTGC CAGCCGACAA 
CTCCAGGGAA TGACTAATGG ACGGACCAGT TCCAGAGGAC AGCTG CTGAA GGAATTTCAA 
CTAAAGCCGA TCGGGAAAAA GGTGAGCTCA GCCGTCGGGT TTGTCCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGT TGACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC TAATTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCTAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGTGATA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGAGATCCCC TAGGGCCTCC AATCGGGCGA GCATTCGGGT 
CCCTGCCCTT AGGTGTTGGT AG AT C CACAG CAAAAC CCGA GGAACTCCTC AAAGAGG CCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACCCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAATGCAAA CCAAGTGTGC AATGCGGTTA ATCTAATACC GCTGGACACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCCA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTA GTGACCCTCA 
GGATTGACAA GGCGATTGGC CCTGGGAAGA T CAT CG AC AA TGCAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAG CAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGGTTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
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AAGAATTCCG 


CATTTACGAC 


GACGTGATCA 


TAAATGATGA 


CCAAGGACTA 


TTCAAAGTTC 


4440 


TGTAGACCGT 


AGTGCCCAGC 


AATACCCGAA 


AACGACCCCC 


CTCATAATGA CAGCCAGAAG 


4500 


GCCCGGACAA 


AAAAGCCCCC 


TCCAAAAGAC 


TTCACGGACC 


AAGCGAGAGG 


CCAGCCAGCA 


4560 


GCCGACAGCA 


AGTGTGGACA 


CCAGGCGGCC 


CAAGCACAGA 


ACAGCCCCGA 


CACAAGGCCA 


4620 


CCACCAGCCA 


TCCCAATCCG 


CGTCCTCCTC 


GTAGGACCCC 


CGAGGACCAA 


CCCCCAAGGT 


4680 


CGCTCCGGAC 


ACAGACCACC 


AGCCGCATCC 


CCACAGCCCT 


CGGGAAAGGA 


ACCCCCAGCA 


4740 


ACTGGAAGGC 


CCCTTCCCCC 


CTCCCCCAAC 


GCAAGAACCC 


CACAACCGAA 


CCGCACAAGC 


4800 


GACCGAGGTG 


ACCCAACCGC 


AGGCATCCGA 


CTCCCTAGAC 


AGACCCTCCC 


TCCCCGGCAT 


4860 


ACTAAACAAA 


ACTTAGGGCC 


AAGGAACACA 


CACACCCGAC 


AGAACCCAGA 


CCCCGGCCCG 


4920 


CGGCACCGCG 


CCCCCACCCC 


CCGAAAACCA 


GAGGGAGCCC 


CCAACCAATC 


CCGCCGCCCC 


4980 


CCCCGGTGCC 


CACAGGTAGG 


CACACCAACC 


CCCGAACAGA 


CCCAGCACCC 


AGCCACCGAC 


5040 


AATCCAAGAC 


GGGGGGCCCC 


CCCCAAAAAA 


AGGCCCCCAG 


GGGCCGACAG 


CCAGCATCGC 


5100 


GAGGAAGCCC 


ACCCACCCCA 


CACACGACCA 


CGGCAACCAA 


ACCAGAGCCC 


AGACCACCCT 


5160 


GGGCCACCAG 


CTCCCAGACT 


CGGCCATCAC 


CCCGAAAAAA 


GGAAAGGCCA 


CAACCCGCGC 


5220 


ACCCCAGGCC 


CGATCCGGCG 


GGAAGCCACC 


CAACCCGAAC 


CAGCACCCAA 


GAGCGATCCC 


5280 


TGGGGGACCC 


CCAAACCGCA 


AAAGACATCA 


GTATCCCACC 


GCCTCTCCAA 


GTCCCCCGGT 


5340 


CTCCTCCTCT 


TCTCGAAGGG 


ACCAAAAGAT 


CAATCCACCA 


CATCCGACGA 


CACTCAATTC 


5400 


CCCACCCCTA 


AAGGAGACAC 


CGGGAATCCC 


AGAATCAAGA 


CTCATCCAAT 


GTCCATCATG 


5460 


GGTCTCAAGG 


TGAATGTCTT 


TGCCATATTC 


ATGGCAGTAC 


TGTTAACTCT 


CCAAACACCC 


5520 


ACCGGTCAAA 


TCCATTGGGG 


CAATCTCTCT 


AAGATAGGGG 


TGGTAGGGAT 


AGGAAGTGCA 


5580 


AGCTACAAAG 


TTATGACTCG 


TTCCAGCCAT 


CAATCATTGG 


TCATAAAATT 


AATGCCCAAT 


5640 


ATAACTCTCC 


TCAATAACTG 


CACGAGGGTA 


GAAATTGCAG 


AATACAGGAG 


ACTACTGAGA 


5700 


ACAGTTTTGG 


AACCAATTAG 


AGATGCACTT 


AATGCAATGA 


CCCAGAATAT 


AAGACCGGTT 


5760 


CAGAGTGTAG 


CTTCAAGTAG 


GAGACACAAG 


AGATTTGCGG 


GAGTTGTCCT 


GGCAGGTGCG 


5820 


GCCCTAGGCG 


TTGCCACAGC 


TGCTCAGATA 


ACAGCCGGCA 


TTGCACTTCA 


CCAGTCCATG 


5880 


CTGAACTCTC 


AAGCCATCGA 


CAATCTGAGA 


GCAAGCCTGG 


AAACTACTAA 


TCAGGCAATT 


5940 
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GAG6CAATCA GGCAAOCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT C CAAGACT AC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTAGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCCATC CAGGCTTTGA GCTATGCGCT TGGGGGAGAT 
ATCAATAAGG TATTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ATATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGT AC C TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTCT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTGGTCGAGG TGAACGGTGT GACCATCCAA GTCGGGAGCA GGAGGTATCC GGACGCGGTG 
TACCTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAAGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGCTGGAG GATGCCAAGG AATTGCTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTTTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GGGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGG 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCCCTACAA CTCTTGAAAC ACAGATTTCC 
CACAAGTCTC CTCTCCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGGC CACCCGAAAT 
TGTCTCCGGC TTCCCTCTGG CCGAACGATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATC CAC A ATGTCACCAC ACCGAGACCG AATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAAGGGA AGTAGGATAG TTATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTC CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAGAGCC TCAGCACCAA 
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TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC OTGCTGACAC CACTCTTCAA 7560 

GATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 7 620 

CATCTCTGAC AAAATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 7 680 

TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 7740 

GGCTGCTGAA GAACTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGG CCAGGGCAAC 7 800 

CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 7860 

ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 7 920 

ATCTATAGTC ACCATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGGAAAGCC 7 980 

TAATCTGAGC AGTAAAGGGT CAGAGTTGTC A CAACTGAGC ATGCACCGAG TGTTTGAAGT 804 0 

AGGGGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 8100 

GCAACCAGTC AGTAATGATT TCAGCAACTG CATGGTGGCT TTGGGGGAGC TCAGGTTCGC 8160 

AGCCCTCTGT CACAGGGAAG ATTCTGTCAC GGTTCCCTAT CAGGGGTCAG GGAAAGGTGT 8220 

CAGCTXCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCCTATCA ACGGATGATC CAGTGATAGA TAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGGACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAAAC CAAGCACTCT GCG AG AATCC 8460 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTAATCT 8520 

GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCAGGA TTCGGGCCAT T GAT CACACA 8580 

CGGTTCAGGG ATGGACCTAT ACAAAACCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC AACCTCTTCA CTGTTCCAAT CAAGGAAGCA GGCGAGGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTAATTCT 8820 

ACCTGGTCAG GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TATGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGGTCCCAA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA TTCAGAATCT GGTGGACATA TCACTCACTC 9060 
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TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC CAGTGAACCG ATCACATGAT GTCACTCAGA CACCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CAT CAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTTCCC GTCATGGACT 9240 

CGCTATCTGT CAACCAGATC TTGTACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTTGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTCTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CAAGGAAGAT CCGTGAGCTC CTAAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACAT CAA GGAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 97 2 0 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 97 80 

CCCATACTTG C CAT AG GAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGC 984 0 

TGTTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAGGA GTCTCAACAT GTATATTACC 9900 

TGACGTTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 996 0 

CCGCTATGAC CATTGATGCT AGGTATGCAG AACTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCTATGC 10080 

TGGAGCCACT TTCACTTGCT TACCTGCAAC T GAGG G A CAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 102 0 0 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CCTTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTOAAAAT GTCAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10380 

AGACTCTGAT GAAGGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 10500 

ATGCTCAAGC TT CAGGTGAA GGGTTAACAC AT G AGCAGT G CGTTGATAAC TGGAGATCAT 10560 

TTGCTGGAGT GAGATTTGGC TGTTTTATGC CTCTTAGCCT GGACAGTGAT CTGACAATGT 10620 
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ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGATCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC QAGCTTTGAC CCATATGATA TGATAATGTA TGTCGTAAGT GGAGCCTACC 
TCCATGACCC TGAGTTCAAT CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT CGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATC GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAGTATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTTAA AGCAGAAAAA GGGTTTGTAG GATTCCCTCA TGTAATTCGG CAGAATCAAG 
ACACTGATCA TCCGGAGAAT AT AGAAAC CT ACGAGACAGT CAGCGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTATTT GCACAGAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAAACCT 
CTGTCCTCTA TGTAAGTGAT CCTCATTGCC CCCCCGACCT TGACGCCCAT GTCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATCA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTACTTATA CCTGGCTGCT TATGAGAGCG 
GGGTAAGGAT TGCCTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATTGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA TCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTTTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGAGATGT AGTCATACCC CTCCTCACAA 
ACAACGATCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAACATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCATCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
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CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGCGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 12300 

TCCATAGTCC AAACCCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGAGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGT CACA GGGG CAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 12480 

AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG AT AA CCA GAT 12540 

TGTCCAATTA TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 12 600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAGAGCC CTAAGAAGCC 12660 

ATATGTGGGC AAGACTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 

TAG AAT CTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 127 BO 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTA GATCCTTGCG ATCTGCCGTT AGAATAGCAA 12960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG AT AG CT CTTG GAACGAAGCC TGGTTGTTGG 13 02 0 

CAAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCGACTT 13080 

CGACTAATTT AG CG CAT AG G TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 1314 0 

CCCTTGTCAG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA AGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 13260 

TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACTGGATC ATCTAACACG GTATTACATC 13320 

TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 13380 

CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 13440 

CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 13500 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC AC AG CACT AT 13560 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 13 620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 13680 

TCACCATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13 740 
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GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AAC TTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTTGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTTTTGT GTGAAAGCGA TGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCGATTCGA GGTCTAAGGC CGGTAGAGAA ATGTG CAGTT CTAACCGATC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGTCG AGGATCTATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTTG ATGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGGTCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGTAGTCT TGCCAATTAT GAAATCCATG CTTTCCGCAG AAT CGGGTT A AACTCATCTG 
CTTGCTACAA AGCTGTTGAG AT AT CAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT ATAGATTGCT 
TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTAGTGTGGG ATTTATCCAT TCAGATATAG 
AGACCTTACC CAACAAAGAT ACTATAGAGA AGTTAGAGGA ATTGGCAGCC ATCTT AT CG A 
TGGCTCTACT CCTTGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGCTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TCTACCCTAG GTACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTTATG ACAGATCTCA 
AAGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATCTATCAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGGCGCAGTT AGTAGAGGTG ATAT CAACCC TATTCTGAAA AAACTTACAC 
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CTATAGAGCA GGTGCTGATC AGTTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AATTAATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGTAGGCAAC GAGAACTTGT ATCTAGGATC ACTCGCAAAT 15540 

TTTGGGGGCA TATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATCGG TTTATCCAGA 15600 

ATCTCAAGTC CGGTTATCTA ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGCGCT CTGATTAAGG 15780 

AT T AATTGGT TGAACTCCGG AACCCTAATC CTACCCTAGG TAGTTAGGCA TTATTTGCAA 15 840 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 2183 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys Hia Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Aen Cys Asn Gin Asp Leu Phe Asn 
85 90 95 
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lie Glu Asp Lys Glu Ser Thr Arg Lye lie Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lya Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lye Glu Lys He lie Aen Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val lie Lys Ser Gin Thr His Thr Cys HiB Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Ala Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 
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Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Arg Ser Phe Ala Gly Val Arg Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys ABp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Lys Ala Glu Lys Gly Phe Val Gly Phe Pro His Val He Arg Gin 
625 630 635 640 

Asn Gin Asp Thr Asp His Pro Glu Asn He Glu Thr Tyr Glu Thr Val 
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645 650 655 

Ser Ala Phe lie Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Olu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu lie Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin lie Phe lie Lys Tyr Pro Met 
725 730 735 

Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr tie 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr lie Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu ,Ile 
915 920 925 
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Arg Met Ala Leu Leu Pro Ala Pro lie Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Xhr Ser Ser 
945 950 955 960 

lie Ala Asp Leu Lys Arg Met lie Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

lie Thr Arg Leu Leu Lys Asn lie Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Arg Leu Ala Ala Phe Leu Met Asp Arg His lie lie Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu lie Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lye Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID:<WO 9813501 A2> 



WO 98/13501 PCT/US97/16718 



- 158 - 



He Aap Lye Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Aap Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Aap Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala Hie Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
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1475 1480 1485 

Gin Cys Ala Ala lie Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys. Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gl n Pro G1 Y Tnr °Y B Pro Pro Ile Gly Leu Ar ^ 

1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His Ile Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro Ile He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe Ile Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys Val Gly Ser Asn Asn lie Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp Ile Asn Thr Ser Lys His Asn Leu Pro Ile Ser Gly Gly 
1730 1735 1740 

Ser Leu Ala Asn Tyr Glu Ile His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 
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Ser Ser Ala Cya Tyr Lys Ala Val Glu lie Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lya Glu He Leu Lys Leu Asn Lye Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser He Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lye He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Gly Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

He Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Ser Cys Gly 
2020 2025 2030 
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Leu Ala lie Asn Gly Pro Lys Leu Cye Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu Val 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His lie Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn lie Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He lie Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(Xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC AC C CATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 
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TTACCACTCG 


ATCCAGACTT 


CTGGACCGGT 


TGGTCAGGTT 


AATTGGAAAC 


CCGGATGTGA 


300 


GCGGGCCCAA 


ACTAACAGGG 


GCACTAATAG 


GTATATTATC 


CTTATTTGTG 


GAGTCTCCAG 


360 


GTCAATTGAT 


TCAGAGGATC 


ACCGATGACC 


CTGACGTTAG 


CATAAGGCTG 


TTAGAGGTTG 


420 


TCCAGAGTGA 


CCAGTCACAA 


TCTGGCCTTA 


CCTTCGCATC 


AAGAGGTACC 


AACATGGAGG 


480 


ATGAGGCGGA 


CCAATACTTT 


TCACATGATG 


ATCCAATTAG 


TAGTGATCAA 


TCCAGGTTCG 


540 


GATGGTTCGA 


GAACAAGGAA 


ATCTCAGATA 


TTGAAGTGCA 


AGACCCTGAG 


GGATTCAACA 


600 


TGATTCTGGG 


TACCATCCTA 


GCCCAAATTT 


GGGTCTTGCT 


CGCAAAGGCG 


GTTACGGCCC 


660 


CAGACACGGC 


AGCTGATTCG 


GAGCTAAGAA 


GGTGGATAAA 


GTACACCCAA 


CAAAGAAGGG 


720 


TGGTTGGTGA 


ATTTAGATTG 


GAGAGAAAAT 


GGTTGGATGT 


GGTGAGGAAC 


AGGATTGCCG 


780 


AGGACCTCTC 


CTTACGCCGA 


TTCATGGTCG 


CTCTAATCCT 


GGATATCAAG 


AGAACACCCG 


840 


GAAACAAACC 


CAGGATTGCT 


GAAATGATAT 


GTGACATTGA 


TACATATATC 


GTAGAGGCAG 


900 


GATTAGCCAG 


TTTTATCCTG 


ACTATTAAGT 


TTGGGATAGA 


AACTATGTAT 


CCTGCTCTTG 


960 


GACTGCATGA 


ATTTGCTGGT 


GAGTTATCCA 


CACTTGAGTC 


CTTGATGAAC 


CTTTACCAGC 


1020 


AAATGGGGGA 


AACTGCACCC 


TACATGGTAA 


TCCTGGAGAA 


CTCAATTCAG 


AACAAGTTCA 


1080 


GTGCAGGATC 


ATACCCTCTG 


CTCTGGAGCT 


ATGCCATGGG 


AGTAGGAGTG 


GAACTTGAAA 


1140 


ACTCCATGGG 


AGGTTTGAAC 


TTTGGCCGAT 


CTTACTTTGA 


TCCAGCATAT 


TTTAGATTAG 


1200 


GGCAAGAGAT 


GGTAAGGAGG 


TCAGCTGGAA 


AGGTCAGTTC 


CACATTGGCA 


TCTGAACTCG 


1260 


GTATCACTGC 


CGAGGATGCA 


AGGCTTGTTT 


CAGAGATTGC 


AATGCATACT 


ACTGAGGACA 


1320 


AGATCAGTAG 


AGCGGTTGGA 






ATTTCTACAC 


GGTGATCAAA 


1380 


GTGAGAATGA 


GCTACCGAGA 


TTGGGGGGCA 


AGGAAGATAG 


GAGGGTCAAA 


CAGAGTCGAG 


1440 


GAGAAGCCAG 


GGAGAGCTAC 


AGAGAAACCG 


GGCCCAGCAG 


AGCAAGTGAT 


GCGAGAGCTG 


1500 


CCCATCTTCC 


AACCGGCACA 


CCCCTAGACA 


TTGACACTGC 


AACGGAGTCC 


AGCCAAGATC 


1560 


CGCAGGACAG 


TCGAAGGTCA 


GCTGACGCCC 


TGCTTAGGCT 


GCAAGCCATG 


GCAGGAATCT 


1620 


CGGAAGAACA 


AGGCTCAGAC 


ACGGACACCC 


CTATAGTGTA 


CAATGACAGA 


AATCTTCTAG 


1680 


ACTAGGTGCG 


AGAGGCCGAG 


GGCCAGAACA 


ACATCCGCCT 


ACCCTCCATC 


ATTGTTATAA 


1740 


AAAACTTAGG 


AACCAGGTCC 


ACACAGCCGC 


CAGCCCATCA 


ACCATCCACT 


CCCACGATTG 


1800 
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GAGCCAATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 192 0 

ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTACG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT C AG G AGGAG A CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 2 280 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2 340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGG CAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 24 60 

GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 27 00 

AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 2820 

CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCAT CAT G A TCGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 3000 

GGCAGAGATT CAGGCCQAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 3120 

CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 3180 

G CAT C ACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GG CT AGAGG A GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 3300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 3360 
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CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCCAAG TTCCACAATG ACAGAGACCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 3540 

TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 3600 

TGCTGGGGGT TGTTGAGGAC AGCGATTCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 3660 

CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 3780 

ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 3840 

TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 3900 

TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 3960 

GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 4020 

GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 4380 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA T T C AAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCCGA CACAAGGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGGATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 

AC T AAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 

SUBSTITU r E SHEET (RULE 26) 

BNSDOCID:<WO 9813501A2> 



WO 98/13501 



PCT/US97/16718 



- 165 - 

CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 5040 

AATC CAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACACCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCTCT TCTCGAAGGG AC CAAAAG AT CAATCCACCA CACCCGACGA CACTCAACTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG T C AT AAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCT CAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 588 0 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 5940 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 612 0 

CGGGACCCCA TATCTGCGGA GAT AT CTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 6180 

ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 630 0 

AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 6360 

GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 642 0 

CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 6480 
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GTGTGCAGCC 


AAAATGCCTT 


GTACCCGATG 


AGTCCTCTGC 


TCCAAGAATG 


CCTCCGGGGG 


6540 


TACACCAAGT 


CCTGTGCTCG 


TACACTCGTA 


TCCGGGTCTT 


TTGGGAACCG 


GTTCATTTTA 


6600 


TCACAAGGGA 


ACCTAATAGC 


CAATTGTGCA 


TCAATCCTTT 


GCAAGTGTTA 


CACAACAGGA 


6660 


ACGATCATTA 


ATCAAGACCC 


TGACAAGATC 


CTAACATACA 


TTGCTGCCGA 


TCACTGCCCG 


6720 


GTAGTCGAGG 


TGAACGGCGT 


GATCATCCAA 


GTCGGGAGCA 


GGAGGTATCC 


AGACGCTGTG 


6780 


TACTTGCACA 


GAATTGACCT 


CGGTCCTCCC 


ATATCATTGG 


AGAGGTTGGA 


CGTAGGGACA 


6840 


AATCTGGGGA 


ATGCAATTGC 


TAAGTTGGAG 


GATGCCAAGG 


AATTGTTGGA 


GTCATCGGAC 


6900 


CAGATATTGA 


GGAGTATGAA 


AGGTTTATCG 


AGCACTAGCA 


TAGTCTACAT 


CCTGATTGCA 


6960 


GTGTGTCTTG 


GAGGGTTGAT 


AGGGATCCCC 


GCTTTAATAT 


GTTGCTGCAG 


GGGGCGTTGT 


7020 


AACAAAAAGG 


GAGAACAAGT 


TGGTATGTCA 


AGACCAGGCC 


TAAAGCCTGA 


TCTTACGGGA 


7080 


ACATCAAAAT 


CCTATGTAAG 


GTCGCTCTGA 


TCCTCTACAA 


CTCTTGAAAC 


ACAAATGTCC 


7140 


CACAAGTCTC 


CTCTTCGTCA 


TCAAGCAACC 


ACCGCACCCA 


GCATCAAGCC 


CACCTGAAAT 


7200 


TATCTCCGGC 


TTCCCTCTGG 


CCGAACAATA 


TCGGTAGTTA 


ATTAAAACTT 


AGGGTGCAAG 


7260 


AT CAT C CACA 


ATGTCACCAC 


AACGAGACCG 


GATAAATGCC 


TTCTACAAAG 


ATAACCCCCA 


7320 


TCCCAAGGGA 


AGTAGGATAG 


TCATTAACAG 


AGAACATCTT 


ATGATTGATA 


GACCTTATGT 


7380 


TTTGCTGGCT 


GTTCTGTTTG 


TCATGTTTCT 


GAGCTTGATC 


GGGTTGCTAG 


CCATTGCAGG 


7440 


CAT TAGACTT 


CATCGGGCAG 


CCATCTACAC 


CGCAGAGATC 


CATAAAAGCC 


TCAGCACCAA 


7500 


TCTAGATGTA 


ACTAACTCAA 


TCGAGCATCA 


GGTCAAGGAC 


GTGCTGACAC 


CACTCTTCAA 


7560 


AATCATCGGT 


GATGAAGTGG 






TTCACTGACC 


TAGTGAAATT 


7620 


CATCTCTGAC 


AAGATTAAAT 


TCCTTAATCC 


GGATAGGGAG 


TACGACTTCA 


GAGATCTCAC 


7680 


TTGGTGTATC 


AACCCGCCAG 


AGAGAATCAA 


ATTGGATTAT 


GATCAATACT 


GTGCAGATGT 


7740 


GGCTGCTGAA 


GAGCTCATGA 


ATGCATTGGT 


GAACTCAACT 


CTACTGGAGA 


CCAGAACAAC 


7800 


CAATCAGTTC 


CTAGCTGTCT 


CAAAGGGAAA 


CTGCTCAGGG 


CCCACTACAA 


TCAGAGGTCA 


7860 


ATTCTCAAAC 


ATGTCGCTGT 


CCCTGTTAGA 


CTTGTATTTA 


GGTCGAGGTT 


ACAATGTGTC 


7920 


ATCTATAGTC 


ACTATGACAT 


CCCAGGGAAT 


GTATGGGGGA 


ACTTACCTAG 


TGGAAAAGCC 


7980 


TAATCTGAGC 


AGCAAAAGGT 


CAGAGTTGTC 


ACAACTGAGC 


ATGTACCGAG 


TGTTTGAAGT 


8040 
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AGGTGTT AT C AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 8100 

GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 8160 

AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 8220 

CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 84 60 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 8520 

GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 8580 

CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGT ATTGG C TG ACT AT CCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 
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AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
AT G AAGGT AC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTT CAT A ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC C CAT ATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 



9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 

10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
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ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 11220 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGAC CAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 11340 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 1140 0 

ATAAAGTCCC CAATGATCAA ATCTT CATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAG CCGT A ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA AC TAGAG ATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATG AT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

AC AACGAC CT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATT AT C 12 060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACT CT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGTGT CCAGAGCATC ACTAG ACT CC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 123 00 

TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 123 60 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 124 80 

AAGGCTTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGC 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 12660 

ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 127 20 
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TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 127 80 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGG AAA CATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 12960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13020 

CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13080 

CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 13140 

CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 13260 

TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 13320 

TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 13380 

CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAG 13440 

CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG C CAT AGGAGG CACCTTGTGG 135 00 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAG C ACT AT 135 60 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 13620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 13680 

TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 13860 

GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 13920 

CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 14100 

GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 14160 

ATATCAAGGC AGAGGCTATG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 14220 

TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 14280 
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GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT T CAG AT AT AG 
AGACCTTGCC TGACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA AT AGG AT C AA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA T TAT AG AG AA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TT CAT CTCT A CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCAT CTGTG A 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT AT ACT CATC C 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA OAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GG AG AC C AAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TT ATT T GC AA 



14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 

15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
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TATATTAAAG AAAACTTTGA AAATACGAAQ TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(XX ) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro He Val Thr Asn Lye He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro HiB Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 

50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
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180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp lie Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu lie His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

Hie Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 
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Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Tr|> Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Aap Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 
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Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr lie 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg lie Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr lie Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lya Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val lie Leu Arg Gin Arg Leu His Asp lie Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr lie Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met lie Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu lie His 
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1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser lie Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 iioo 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 IHO 1115 H20 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 H30 H35 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 H45 H50 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 11^5 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
H70 1175 H80 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
H85 H90 H95 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 | 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Aen Glu Glu 
1570 1575 15B0 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys Hie Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1660 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Aen He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
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1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe lie Val Ser Asn lie Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asp Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lye He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin lie He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala lie Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 
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Leu lie Leu Asp Leu His Gin Aen He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu lie Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 11: 

ti) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTGAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 540 

GATGGTTCGG GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 
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TAGTTGGTGA 


ATTTAGATTG 


GAGAGAAAAT 


GGTTGGATGT 


GGTGAGGAAC AGGATTGCCG 


760 


AGGACCTCTC 


CTTACGCCGA 


TTCATGGTCG 


CTCTAATCCT 


GGATATCAAG 


AGAACACCCG 


B40 


GAAACAAACC 


CAGGATTGCT 


GAAATGATAT 


GTGACATTGA 


TACATATATC 


GTAGAGGCAG 


900 


GATTAGCCAG 


TTTTATCCTG 


ACTATTAAGT 


TTGGGATAGA 


AACTATGTAT 


CCTGCTCTTG 


960 


GACTGCATGA 


ATTTGCTGGT 


GAGTTATCCA 


CACTTGAGTC 


CTTGATGAAC 


CTTTACCAGC 


1020 


AAATGGGGGA 


AACTGCACCC 


TACATGGTAA 


TCCTGGAGAA 


CTCAATTCAG 


AACAAGTTCA 


1080 


GTGCAGGATC 


AT AC C CTCTG 


CTCTGGAGCT 


ATGCCATGGG 


AGTAGGAGTG 


GAACTTGAAA 


1140 


ACTCCATGGG 


AGGTTTGAAC 


TTTGGCCGAT 


CTTACTTTGA 


TCCAGCATAT 


TTTAGATTAG 


1200 


GGCAAGAGAT 


GGTAAGGAGG 


TCAGCTGGAA 


AGGTCAGTTC 


CACATTGGCA 


TCTGAACTCG 


1260 


GTATCACTGC 


CGAGGATGCA 


AGGCTTGTTT 


CAGAGATTGC 


AATGCATACT 


ACTGAGGACA 


1320 


AGATCAGTAG 


AGCGGTTGGA 


CCCAGACAAG 


CCCAAGTATC 


ATTTCTACAC 


GGTGATCAAA 


1380 


GTGAGAATGA 


GCTACCGAGA 


TTGGGGGGCA 


AGGAAGATAG 


GAGGGTCAAA 


CAGAGTCGAG 


1440 


GAGAAGCCAG 


GGAGAGCTAC 


AGAGAAACCG 


GGCCCAGCAG 


AG CAAGTG AT 


GCGAGAGCTG 


1500 


CCCATCTTCC 


AACCGGCACA 


CCCCTAGACA 


TTGACACTGC 


AACGGAGTCC 


AGCCAAGATC 


1560 


CGCAGGACAG 


TCGAAGGTCA 


GCTGACGCCC 


TGCTTAGGCT 


GCAAGCCATG 


GCAGGAATCT 


1620 


CGGAAGAACA 


AGGCTCAGAC 


ACGGACACCC 


CTATAGTGTA 


CAATGACAGA 


AAT CTTCT AG 


1680 


ACTAGGTGCG 


AGAGGCCGAG 


GGCCAGAACA 


ACATCCGCCT 


ACCATCCATC 


ATTGTTATAA 


1740 


AAAACTTAGG 


AACCAGGTCC 


ACACAGCCGC 


CAGCCCATCA 


ACCATCCACT 


CCCACGATTG 


1800 


GAGCCAATGG 


CAGAAGAGCA 


GGCACGCCAT 


GTCAAAAACG 


GACTGGAATG 


CATCCGGGCT 


1860 


CTCAAGGCCG 


AGCCCATCGG 


CTCACTGGCC 


ATCGAGGAAG 


CTATGGCAGC 


ATGGTCAGAA 


1920 


ATATCAGACA 


ACCCAGGACA 


GGAGCGAGCC 


ACCTGCAGGG 


AAGAGAAGGC 


AGGCAGTTCG 


1980 


GGTCTCAGCA 


AACCATGCCT 


CTCAGCAATT 


GGATCAACTG 


AAGGCGGTGC 


ACCTCGCATC 


2040 


CGCGGTCAGG 


GACCTGGAGA 


GAGCGATGAC 


GACGCTGAAA 


CTTTGGGAAT 


CCCCCCAAGA 


2100 


AATCTCCAGG 


CATCAAGCAC 


TGGGTTACAG 


TGTTATTACG 


TTTATGATCA 


CAGCGGTGAA 


2160 


GCGGTTAAGG 


GAATCCAAGA 


TGCTGACTCT 


ATCATGGTTC 


AATCAGGCCT 


TGATGGTGAT 


2220 


AGCACCCTCT 


CAGGAGGAGA 


CAATGAATCT 


GAAAACAGCG 


ATGTGGATAT 


TGGCGAACCT 


2280 
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GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 2460 

GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCOTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 27 60 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 2820 

CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 3 000 

GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 3120 

CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 3300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 3360 

CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCCAAG GTCCACAATG ACAGAGACCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 3540 

TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 3 600 

TGCTGGGGGT TGTTGAGGAC AGCGATTCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 3660 

TCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 3780 

ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 3840 
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TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 3900 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTGCTA 3950 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 4020 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 42 60 

GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 432 0 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 43 80 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA C CAAGGACT A TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 4 620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4 800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGTCCA 4920 

CGGTGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 5400 
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CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGACAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
AT CAAT AAGG TGT XAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GAC CATC CAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 



5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID:<WO 9813501A2> 



WO 98/13501 



PCT/US97/16718 



- 185 - 



GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CAC CTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATCAAAACTT AGGGTG CAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CAT CGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTGTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
AATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGT TATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
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GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 85 BO 

CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT B940 

GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CAT CAG AATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG T CAG AA CATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 97 80 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 
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TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCAT GAG TTAACTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACAT CTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGT CAT GCCATATTTT GTGGAATCAT AATCAACGGC TAT CGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAAT CCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCC CAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
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ACTTTGTAAT TCTTAGQCAA A6GCTACAT6 ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCT TGCATAT TCCCTGAACG TCCTAAAAGT GAT AC AG C AA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGT GTCAC A GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGG C TTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG AT AAC C AG AT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTC C CAT AT A TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
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CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AG G AC CAT AT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AG AATGAG CA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTATG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAA GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 



13260 

13320 

13380 

13440 

13500 

13560 

13620 

13680 

13740 

13800 

13860 

13920 

13980 

14040 

14100 

14160 

14220 

14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 
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TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TGACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC AT CTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATCTCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCT G AAAAG A TTAAGCAGCA GATAATTGAA TCAT CTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG AT AT CAATC C TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA ACAAAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGT CAA GGAGAC CAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



14820 
14880 
14940 
15000 
15060 
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15180 
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15300 
15360 
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Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met lie He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val lie Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys L©u 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val lie Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 
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Gly Aen Pro Thr Tyr Gin lie Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
3 °5 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Thr Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
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545 550 555 560 

Glu Asn Leu lie Ser Aan Gly lie Gly Lys Tyr Phe Lya Asp Aen Gly 
565 570 575 

Met Ala Lye Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro Gin Val lie Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu lie Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lya Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Aap Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 7B0 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
80S 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Ser Lya Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

lie Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 



BNSD0CID:<W0 9813501 A2> 



SUBSTITUTE SHEET (RULE 26) 



1 



WO 98/13501 PCT/US97/16718 



- 195 - 



Asn Tyr Aep Tyr Glu Gin Phe Arg Ala Gly Mot Val Leu Lou Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu lie Aep Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

Hie Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Aen Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 



Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
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1380 



1385 



1390 



Cys Thr Asn Pro Lou He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys ser Thr 
1425 1430 1435 144( 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val Hie Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He Hie 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lye Ala Glu Ala 
1635 1640 1645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro lie He Val 
1650 1655 1660 
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Aap His Tyr Ser Cys Ser Leu Thr 
1665 1670 

Gin Xle Arg Leu Arg Val Asp Pro 
1685 

Glu Val Asn Val Ser Gin Pro Lye 
1700 



Tyr Leu Arg Arg Gly Ser lie Lys 
1675 1680 

Gly Phe Xle Phe Asp Ala Leu Ala 
1690 1695 

He Gly Ser Asn Asn He Ser Asn 
1705 1710 



Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Lou He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu lie Thr Tyr Lys Glu He Leu Lys Leu Asn LyB Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
I860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Aap Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 
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Tyr Arg Glu Val Asn Lou Val Tyr Pro Arg Tyr Ser Aen Phe lie Ser 
1940 1945 1950 

Thr Qlu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Aan Arg Leu Met 
1955 1960 1965 

Aan Pro Glu Lys lie Lys Gin Gin lie lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu lie Gly Hia lie Leu Ser lie Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

lie Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 ^203 0 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg lie Thr Arg Lys Phe Trp Gly Hie He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Lys Lys Leu He Aen Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu LyB Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SBQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 15894 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA T CAATG AT C A TTTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAAGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 3 00 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 4 20 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCTCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 7 80 

AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 840 

GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 1020 

AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 1080 

GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 1140 

ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 1200 

GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 1260 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 981 3501 A2> 



WO 98/13501 PCT/US97/167I8 



- 200 - 



GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 1320 

AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 1380 

GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 1440 

GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AG CAAGTGAT GCGAGAGCTG 1500 

CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 1560 

CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 1620 

CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 1680 

ACTAGGTGCG AGAGGCCGAG GGCCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 174 0 

AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 1800 

GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 1920 

ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 2280 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC TCCGGACCCC 2460 

GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 27 60 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 2B20 
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CTGCTGTTAT TGAAGGGA6A AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AG CAT AT CCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AAT GAAGT AG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTTGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AAATCCACAG CAAAGCCCGA AAAACTC CTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AGTG CGGTT A ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGG C A AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TAGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAG AT GCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
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AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA TACAAGGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 

CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC GACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 52 BO 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCCCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CC AG TCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 5940 



BNSDOCID: <VJO 981 3501 A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT7US97/16718 



- 203 - 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 6X20 

CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 6180 

ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 6300 

AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 6360 

GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 6420 

CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 6480 

GTGT GCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 6540 

TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 6600 

T CACAAGGG A ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 6660 

ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 6720 

GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGATGCTGTG 6780 

TACTTGCACA GAAT TGACCT CGGTCCTCCC AT AT C ATTGG AG AGGTT GG A CGTAGGGACA 6840 

AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 6900 

CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 6960 

GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 7020 

AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 7080 

ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 7140 

CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CAC CTGAAAT 7200 

TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 7260 

ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 7320 

TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 7380 

TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 7440 

CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAG CC TCAGCACCAA 7500 
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TCTAGATGTA 


ACTAACTCAA 


TCGAGCATCA 


GGTCAAGGAC 


GTGCTGACAC 


CACTCTTCAA 


7560 


AATCATCGGT 


GATGAAGTGG 


GCCTGAGGAC 


ACCTCAGAGA 


TTCACTGACC 


TAGTGAAATT 


7620 


CATCTCTGAC 


AAGATTAAAT 


TCCTTAATCC 


GGATAGGGAG 


TACGACTTCA 


GAGATCTCAC 


7680 


TTGGTGTATC 


AACCCGCCAG 


AGAGAATCAA 


ATTGGATTAT 


GATCAATACT 


GTGCAGATGT 


7740 


GGCTGCTGAA 


GAGCTCATGA 


ATGCATTGGT 


GAACTCAACT 


CTACTGGAGA 


CCAGAACAAC 


7800 


CAATCAGTTC 


CTAGCTGTCT 


CAAAGGGAAA 


CTGCTCAGGG 


CCCACTACAA 


TCAGAGGTCA 


7860 


ATTCTCAAAC 


ATGTCGCTGT 


CCCTGTTAGA 


CTTGTATTTA 


GGTCGAGGTT 


ACAATGTGTC 


7920 


ATCTATAGTC 


ACTATGACAT 


CCCAGGGAAT 


GTATGGGGGA 


ACTTACCTAG 


TGGAAAAGCC 


7980 


TAATCTGAGC 


AGCAAAAGGT 


CAGAGTTGTC 


ACAACTGAGC 


ATGTACCGAG 


TGTTTGAAGT 


8040 


AGGTGTTATC 


AGAAATCCGG 


GTTTGGGGGC 


TCCGGTGTTC 


CATATGACAA 


ACTATCTTGA 


6100 


GCAACCAGCC 


AGTAATGATC 


TCAGCAACTG 


TATGGTGGCT 


TTGGGGGAGC 


TCAAACTCGC 


8160 


AGCCCTTTGT 


CACGGGGAAG 


ATTCTATCAC 


AATTCCCTAT 


CAGGGATCAG 


GGAAAGGTGT 


6220 


CAGCTTCCAG 


CTCGTCAAGC 


TAGGTGTCTG 


GAAATCCCCA 


ACCGACATGC 


AATCCTGGGT 


8280 


CCCCTTATCA 


ACGGATGATC 


CAGTGATAGA 


CAGGCT TT AC 


CTCTCATCTC 


ACAGAGGTGT 


8340 


TATCGCTGAC 


AATCAAGCAA 


AATGGGCTGT 


CCCGACAACA 


CGAACAGATG 


ACAAGTTGCG 


8400 


AATGGAGACA 


TGCTTCCAAC 


AGGCGTGTAA 


GGGTAAAATC 


CAAGCACTCT 


GCGAGAATCC 


6460 


CGAGTGGGCA 


CCATTGAAGG 


ATAACAGGAT 


TCCTTCATAC 


GGGGTCTTGT 


CTGTTGATCT 


8520 


GAGTCTGACA 


GTTGAGCTTA 


AAATCAAAAT 


TGCTTCGGGA 


TTCGGGCCAT 


TGATCACACA 


8580 


CGGTTCAGGG 


ATGGACCTAT 


ACAAATCCAA 


CCACAACAAT 


uXVjlnli 




OOiU 


GCCAATGAAG 


AACCTAGCCT 


TAGGTGTAAT 


CAACACATTG 


GAGTGGATAC 


CGAGATTCAA 


8700 


GGTTAGTCCC 


TACCTCTTCA 


ATGTCCCAAT 


TAAGGAAGCA 


GGCGAAGACT 


GCCATGCCCC 


8760 


AACATACCTA 


CCTGCGGAGG 


TGGATGGTGA 


TGTCAAACTC 


AGTTCCAATC 


TGGTGATTCT 


8820 


ACCTGGTCAA 


GATCTCCAAT 


ATGTTTTGGC 


AACCTACGAT 


ACTTCCAGGG 


TTGAACATGC 


8880 


TGTGGTTTAT 


TACGTTTACA 


GCCCAGGCCG 


CTCATTTTCT 


TACTTTTATC 


CTTTTAGGTT 


8940 


GCCTATAAAG 


GGGGTCCCCA 


TCGAATTACA 


AGTGGAATGC 


TTCACATGGG 


ACCAAAAACT 


9000 


CTGGTGCCGT 


CACTTCTGTG 


TGCTTGCGGA 


CTCAGAATCT 


GGTGGACATA 


TCACTCACTC 


9060 
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TGGGATGGTG GGCATGGGAG T C AG CTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC TAGTGAACCA ATCTCATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTT TTCCA ATGCTTAAGG GACACTAACT CACGG CTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT G TAT AT TACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
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ACCTAAAGGA 


CAAGGCACTT 


GCTGCTCTCC 


AAAGGGAATG 


GGATTCAGTT 


TACCCGAAAG 


10680 


AGTTCCTGCG 


TTACGACCCT 


CCCAAGGGAA 


CCGGGTCACG 


GAGGCTTGTA 


GATGTTTTCC 


10740 


TTAATGATTC 


GAGCTTTGAC 


CCATATGATG 


TGATAATGTA 


TGTTGTAAGT 


GGAGCTTACC 


10800 


TCCATGACCC 


TGAGTTCAAC 


CTGTCTTACA 


GCCTGAAAGA 


AAAGGAGATC 


AAGGAAACAG 


10860 


GTAGACTTTT 


TGCTAAAATG 


ACTTACAAAA 


TGAGGGCATG 


CCAAGTGATT 


GCTGAAAATC 


10920 


TAATCTCAAA 


CGGGATTGGC 


AAATATTTTA 


AGGACAATGG 


GATGGCCAAG 


GATGAG CACG 


10980 


ATTTGACTAA 


GGCACTCCAC 


ACTCTAGCTG 


TCTCAGGAGT 


CCCCAAAGAT 


CTCAAAGAAA 


11040 


GTCACAGGGG 


GGGGCCAGTC 


TTAAAAACCT 


ACTCCCGAAG 


CCCAGTCCAC 


ACAAGTACCA 


11100 


GGAACGTGAG 


AGCAGCAAAA 


GGGTTTATAG 


GGTTCCCTCA 


AGTAATTCGG 


CAGGACCAAG 


11160 


ACACTGATCA 


TCCGGAGAAT 


ATGGAAGCTT 


ACGAGACAGT 


CAGTGCATTT 


ATCACGACTG 


11220 


ATCTCAAGAA 


GTACTGCCTT 


AATTGGAGAT 


ATGAGACCAT 


CAGCTTGTTT 


GCACAGAGGC 


11280 


TAAATGAGAT 


TTACGGATTG 


CCCTCATTTT 


TCCAGTGGCT 


GCATAAGAGG 


CTTGAGACCT 


11340 


CTGTCCTGTA 


TGTAAGTGAC 


CCTCATTGCC 


CCCCCGACCT 


TGACGCCCAT 


ATCCCGTTAT 


11400 


ATAAAGTCCC 


CAATGATCAA 


ATCTTCATTA 


AGTACCCTAT 


GGGAGGTATA 


GAAGGGTATT 


11460 


GTCAGAAGCT 


GTGGACCATC 


AGCACCATT C 


CCTATCTATA 


CCTGGCTGCT 


TATGAGAGCG 


11520 


GAGTAAGGAT 


TGCTTCGTTA 


GTGCAAGGGG 


ACAAT C AG AC 


CATAGCCGTA 


AC AAAAAGG G 


11580 


TACCCAGCAC 


ATGGCCCTAC 


AACCTTAAGA 


AACGGGAAGC 


TGCTAGAGTA 


ACTAGAGATT 


11640 


ACTTTGTAAT 


TCTTAGGCAA 


AGGCTACATG 


ATATTGGCCA 


TCACCTCAAG 


GCAAATGAGA 


11700 


CAATTGTTTC 


ATCACATTTT 


TTTGTCTATT 


CAAAAGGAAT 


ATATTATGAT 


GGGCTACTTG 


11760 


TGTCCCAATC 


ACTCAAGAGC 


ATCGCAAGAT 


GTGTATTCTG 


GTCAGAGACT 


ATAGTTGATG 


11820 


AAACAAGGGC 


AGCATGCAGT 


AATATTGCTA 


CAACAATGGC 


TAAAAGCATC 


GAGAGAGGTT 


11880 


ATGACCGTTA 


CCTTGCATAT 


TCCCTGAACG 


TCCTAAAAGT 


GATACAGCAA 


ATTCTGATCT 


11940 


CTCTTGGCTT 


CACAATCAAT 


TCAACCATGA 


CCCGGGATGT 


AGTCATACCC 


CTCCTCACAA 


12000 


ACAACGACCT 


CTTAATAAGG 


ATGGCACTGT 


TGCCCGCTCC 


TATTGGGGGG 


ATGAATTATC 


12060 


TGAATATGAG 


CAGGCTGTTT 


GTCAGAAACA 


TCGGTGATCC 


AGTAACATCA 


TCAATTGCTG 


12120 


ATCT CAAGAG 


AATGATTCTC 


GCCTCACTAA 


TGCCTGAAGA 


GACCCTCCAT 


CAAGTAATGA 


12180 
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CACAACAACC QGGGGACTCT TCATTC CTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GAT AC CACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC C CTGATGT AC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC AXGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TAT AC CACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTG GGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTT GAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATT AT CAT A 
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GACCATCAGG GAAATATCAO ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGT CAACCAG 
GGACCTGCCC ACCAATT CGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCCCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCtGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG AT ATCAATC C TACTCTGAAA AAACTTACAC 
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CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 15600 

ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 157 80 

ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 1584 0 

TAGATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met lie lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val lie Lye Ser Lys Leu Arg Ser Tyr Pro 
65 70 ' 75 80 

Ala His Ser His lie Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
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85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 no 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg Hie Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe lie Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 
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Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val lie Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala lie Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 
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Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe lie Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu lie Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His lie 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin lie Phe lie Lys Tyr Pro Met 
725 730 735 

Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr lie 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg lie Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr lie Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val lie Leu Arg Gin Arg Leu His Asp lie Gly His 
805 810 B15 

His Leu Lys Ala Asn Glu Thr lie Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 655 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
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915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro lie Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn lie Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met lie Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 



He Thr Arg Leu Leu Lys Asn He 
1010 1015 

Ser Pro Asn Pro Met Leu Lys Gly 
1025 1030 

Glu Asp Glu Gly Leu Ala Ala Phe 
1045 

Pro Arg Ala Ala His Glu He Leu 
1060 



Thr Ala Arg Phe Val Leu He His 
1020 

Leu Phe His Asp Asp Ser Lys Glu 
1035 1040 

Leu Met Aep Arg His He He Val 
1050 1055 

Asp His Ser Val Thr Gly Ala Arg 
1065 1070 



Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 
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He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lye Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 
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Thr Glu Phe Leu Leu lie Glu Pro Arg Leu Phe Thr lie Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Aon Trp Ala Phe Asp Val His Tyr Hie Arg Pro 
1490 1495 1500 

Ser Gly Lye Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He lie Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
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1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu He Arg Arg 
1765 1770 1775 

CyB Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Aan Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe lie His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr ile Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser Ile Leu Val Ile Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser HiB 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe Ile Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin Ile Ile Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser Ile Lys Gin Leu Ser Cys 
1965 1990 1995 2000 

Ile Gin Ala Ile Val Gly Asp Ala Val Ser Arg Gly Asp Ile Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 
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I*au Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu lie Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 
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GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 24 0 

TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGG AT C ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 42 0 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CAAATACTTT TCACATGATG ATCCAATTAG TAGT GAT CAA TCCAGGTTCG 54 0 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA AT TTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 

AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 840 

GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 1020 

AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 1080 

GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 114 0 

ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 1200 

GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 1260 

GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 1320 

AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTG AT CAAA 1380 

GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA C AG AGTC GAG 1440 

GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 1500 

CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 1560 

CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 1620 

CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTT CTAG 1680 

ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 1740 
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AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 1800 

GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 1860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 1920 

ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGAT CAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 22 20 

AGCACCCTAT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 22 80 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 24 60 

GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CC CAATGTG C TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTA TGTGAGCAAT 264 0 

GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 2 820 

CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA AC CCAT CAT A 3000 

GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 3120 

CCAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTT CCAC 3300 
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CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 3360 

CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCGACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 3540 

TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTCTC 3600 

TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 3 660 

CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 3780 

ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 384 0 

TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 3900 

TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 396 0 

GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 402 0 

GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 4140 

ATTATTG GAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 4380 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACARTGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACTGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA TACAAGGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 474 0 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 
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ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCT AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AG ATGCA CTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAG CAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATT TACTGGG CAT CTT AG AG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 



4920 
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CAAGGGTACC 


TTATCTCGAA 


TTTTGATGAG 


TCATCGTGTA 


CTTTCATGCC 


AGAGGGGACT 


6480 


GTGTGCAGCC 


AAAATGCCTT 


GTACCCGATG 


AGTCCTCTGC 


TCCAAGAATG 


CCTCCGGGGG 


6540 


TACACCAAGT 


CCTGTGCTCG 


TACACTCGTA 


TCCGGGTCTT 


TTGGGAACCG 


GTTCATTTTA 


6600 


TCACAAGGGA 


ACCTAATAGC 


CAATTGTGCA 


TCAATCCTTT 


GCAAGTGTTA 


CACAACAGGA 


6660 


ACGATCATTA 


ATCAAGACCC 


TGACAAGATC 


CTAACATACA 


TTGCTGCCGA 


TAACTGCCCG 


6720 


GTAGTCGAGG 


TGAACGGCGT 


GACCATCCAA 


GTCGGGAGCA 


GGAGGTATCC 


AGACGCTGTG 


6780 


TACTTGCACA 


GAATTGACCT 


CGGTCCTCCC 


ATATTATTGG 


AGAGGTTGGA 


CGTAGGGACA 


6840 


AATCTGGGGA 


ATGCAATTGC 


TAAGTTGGAG 


GATGCCAAGG 


AATTGTTGGA 


GTCATCGGAC 


6900 


CAGATATTGA 


GGAGTATGAA 


AGGTTTATCG 


AGCACTTGCA 


TAGTCTACAT 


CCTGATTGCA 


6960 


GTGTGTCTTG 


GAGGGTTGAT 


AGGGATCCCC 


GCTTTAATAT 


GTTGCTGCAG 


GGGGCGTTGT 


7020 


AACAAAAAGG 


GAGAACAAGT 


TGGTATGTCA 


AGACCAGGCC 


TAAAGCCTGA 


TCTTACGGGA 


7080 


A CAT CAAAAT 


CCTATGTAAG 


GTCGCTCTGA 


TCCTCTACAA 


CTCTTGAAAC 


ACAAATGTCC 


7140 


CACAAGTCTC 


CTCTTCGTCA 


TCAAGCAACC 


ACCGCACCCA 


GCATCAAGCC 


CACCTGAAAT 


7200 


TATCTCCGGC 


TTCCCTCTGG 


CCGAACAATA 


TCGGTAGTTA 


ATTAAAACTT 


AGGGTGCAAG 


7260 


ATCATCCACA 


ATGTCACCAC 


AACGAGACCG 


GATAAATGCC 


TTCTACAAAG 


ATAACCCCCA 


7320 


TCCCAAGGGA 


AGTAGGATAG 


TCATTAACAG 


AGAACATCTT 


ATGATTGATA 


GACCTTATGT 


7380 


TTTGCTGGCT 


GTTCTGTTTG 


TCATGTTTCT 


GAGCTTGATC 


GGGTTGCTAG 


CCATTGCAGG 


7440 


CATTAGACTT 


CATCGGGCAG 


CCATCTACAC 


CGCAGAGATC 


CATAAAAGCC 


TCAGCACCAA 


7500 


TCTAGATGTA 


ACTAACTCAA 


TCGAGCATCA 


GGTCAAGGAC 


GTGCTGACAC 


CACTCTTCAA 


7560 


AATCATCGGT 


GATGAAGTGG 


GCCTGAGGAC 


ACCTCAGAGA 


TTCACTGACC 


TAGTGAAATT 


7620 


CATCTCTGAC 


AAGATTAAAT 


TCCTTAATCC 


GGATAGGGAG 


TACGACTTCA 


GAGATCTCAC 


7680 


TTGGTGTATC 


AACCCGCCAG 


AGAGAATCAA 


ATTGGATTAT 


GATCAATACT 


GTGCAGATGT 


7740 


GGCTGCTGAA 


GAGCTCATGA 


ATGCATTGGT 


GAACT CAACT 


CTACTGGAGA 


CCAGAACAAC 


7800 


CAATCAGTTC 


CTAGCTGTCT 


CAAAGGGAAA 


CTGCTCAGGG 


CCCACTACAA 


TCAGAGGTCA 


7860 


ATTCTCAAAC 


ATGTCGCTGT 


CCCTG TTAGA 


CTTGTATTTA 


GGTCGAGGTT 


ACAATGTGTC 


7920 


ATCTATAGTC 


ACTATGACAT 


CCCAGGGAAT 


GTATGGGGGA 


ACTTACCTAG 


TGGAAAAGCC 


7980 
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TAATCTGAGC 


AGCAAAAGGT 


CAGAGTTGTC 


ACAACTGAGC 


ATGTACCGAG 


TGTTTGAAGT 


8040 


AGGTGTTATC 


AGAAATCCGG 


GTTTGGGGGC 


TCCGGTGTTC 


CATATGACAA 


ACTATCTTGA 


8100 


GCAACCAGTC 


AGTAATGATC 


TCAGCAACTG 


TATGGTGGCT 


TTGGGGGAGC 


TCAAACTCGC 


8160 


AGCCCTTTGT 


CACCGGGAAG 


ATTCTATCAC 


AATTCCCTAT 


CAGGGATCAG 


GGAAAGGTGT 


8220 


CAGCTTCCAG 


CTCGTCAAGC 


TAGGTGTCTG 


GAAATCCCCA 


ACCGACATGC 


AATCCTGGGT 


8280 


CACCTTATCA 


ACGGATGATC 


CAGTGATAGA 


CAGGCTTTAC 


CTCTCATCTC 


ACAGAGGTGT 


8340 


TATCGCTGAC 


AATCAAGCAA 


AATGGGCTGT 


CCCGACAACA 


CGAACAGATG 


ACAAGTTGCG 


8400 


AATGGAGACA 


TGCTTCCAAC 


AGGCGTGTAA 


GGGTAAAATC 


CAAGCACTCT 


GCGAGAATCC 


8460 


CGAGTGGGCA 


CCATTGAAGG 


ATAACAGGAT 


TCCTTCATAC 


GGGGTCTTGT 


CTGTTGATCT 


8520 


GAGTCTGACA 


GTTGAGCTTA 


AAATCAAAAT 


TGCTTCGGGA 


TTCGGGCCAT 


TGATCACACA 


8580 


CGGTTCAGGG 


ATGGACCTAT 


ACAAATCCAA 


CCACAACAAT 


GTGTATTGGC 


TGACTATCCC 


8640 


ACCAATGAAG 


AACCTAGCCT 


TAGGTGTAAT 


CAACACATTG 


GAGTGGATAC 


CGAGATTCAA 


8700 


GGTTAGTCCC 


TACCTCTTCA 


ATGTCCCAAT 


T AAGGAAG C A 


GGCGAAGACT 


GCCATGCCCC 


8760 


AACATACCTA 


CCTGCGGAGG 


TGGATGGTGA 


TGTCAAACTC 


AGTTCCAATC 


TGGTGATTCT 


8820 


ACCTGGTCAA 


GATCTCCAAT 


ATGTTTTGGC 


AACCTACGAT 


ACTTCCAGGG 


TTGAACATGC 


8880 


TGTGGTTTAT 


TACGTTTACA 


GCCCAAGCCG 


CTCATTTTCT 


T ACTTTT AT C 


CTTTTAGGTT 


8940 


GCCTATAAAG 


GGGGTCCCCA 


TCGAATTACA 


AGTGGAATGC 


TTCACATGGG 


ACCAAAAACT 


9000 


CTGGTGCCGT 


CACTTCTGTG 


TGCTTGCGGA 


CTCAGAATCT 


GGTGGACATA 


TCACTCACTC 


9060 


TGGGATGGTG 


GGCATGGGAG 


TCAGCTGCAC 


AGTCACCCGG 


GAAGATGGAA 


CCAATCGCAG 


9120 


ATAGGGCTGC 


TAGTGAACTA 


ATCTCATGAT 


GTCACCCAGA 


CATCAGGCAT 


ACCCACTAGT 


9180 


GTGAAATAGA 


CATCAGAATT 


AAGAAAAACG 


TAGGGTCCAA 


GTGGTTCCCC 


GTTATGGACT 


9240 


CGCTATCTGT 


CAACCAGATC 


TTATACCCTG 


AAGTTCACCT 


AGATAGCCCG 


ATAGTTACCA 


9300 


ATAAGATAGT 


AGCCATCCTG 


GAGTATGCTC 


GAGTCCCTCA 


CGCTTACAGC 


CTGGAGGACC 


9360 


CTACACTGTG 


TCAGAACATC 


AAGCACCGCC 


TAAAAAACGG 


ATTTTCCAAC 


CAAATGATTA 


9420 


TAAACAATGT 


GGAAGTTGGG 


AATGTCATCA 


AGTCCAAGCT 


TAGGAGTTAT 


CCGGCCCACT 


9480 


CTCATATTCC 


ATATCCAAAT 


TGTAATCAGG 


ATTTATTTAA 


CATAGAAGAC 


AAAGAGTCAA 


9540 
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CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GT AT ATT A C C 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGT CAT AG A GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 10200 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 1032 0 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10360 

AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 10500 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 

ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 10680 

AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 11100 
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GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 11160 

ACACTGATCA TCCGGAGAAT ATGGAAG CTT ACGAGACAGT CAGTGCATTT AT CACGACTG 1122 0 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 11340 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 11400 

ATAAAGTCCC CAATGAT CAA ATCTT CATT A AGTACCCTAT GGGAGGTATA GAAGGGTATT 1146 0 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC AT CACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAG AGAGGT T 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATT AT C 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 12300 

TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 12480 

AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 12 660 
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ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 

TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 12780 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 12 960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13020 

CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13 080 

CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 13140 

CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 13260 

TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC AT C TAACACG GTATTACATC 13320 

TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 13380 

CGCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 13440 

CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 13500 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 13560 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCAT AT GAATGAAATT TCAGCTCTCA 13620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 13680 

TCACTATC T A CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGT GTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAA GAT CT AC AAGAAATTCT 13860 

GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 13920 

CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 14100 

GGGCCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 14160 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA . 14220 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCtD: <WO 981 3501 A2> 



WO 98/13501 



PCT/US97/16718 



- 227 - 



TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTTA T TAT AG AG AA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTG CTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAOACA ACCGAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGT CGG AT AC AGTGC C CTGATTAAGG 
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ACTAATTGAT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 
TATATTAAAG AAAACTTTGA AAATACGAAG TTT CTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Asp Ser Leu Ser Val Aan Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro lie Val Thr Asn Lye lie Val Ala He Leu Glu Tyr Ala 

20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lye His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Aan Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 
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Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 19° 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 2B5 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 



290 



295 



300 



Leu Asn His Cys Phe Thr Glu lie His Asp Val Leu Asp Gin Asn Gly 
305 310 



315 320 



Phe Ser Asp Glu Gly Thr Tyr 
325 



His Glu Leu He Glu Ala Leu Asp Tyr 

330 335 



lie Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 3 50 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 



355 



360 365 



Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 3B0 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cye Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
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450 «S 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
4 " 470 475 4 Y Q 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arc 
485 490 495 9 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 



535 



540 



Leu Phe Ala Lya Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu lie Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 60S 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 

5X0 



615 



620 



Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 



625 630 635 



640 



Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu Hie Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 7 " 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 
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Gly Gly lie Glu Gly Tyr Cys Gin Lye Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lye Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 90S 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro lie Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu Hie Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 
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lie Thr^Arg Leu Leu Lys A 8 „ l le Thr Ala Arg Phe Val Leu He Hi* 

1015 1020 

Ser Pro Asn Pro Met Leu Lys Glv Leu Ph« ^ » . 
1025 1030 " A8P A8P SGr ^ 61u 



1035 



1040 



Glu Asp Glu Gly Leu A la Ala phe Leu Met Asp ^ 

1045 "50 1055 

Pro Arg Ala jj. His Glu He Leu Asp His Ser Val T hr Gly Ala ^ 

1065 1070 

Glu Ser lie Ala aiy Met Leu Asp Thr Thr Lya Gly Leu Arg ^ 

1080 1085 

Ser «et Arg Lye Gly Gly L hr Ser Arg ^ ^ ^ ^ ^ ^ 

1095 1100 
Asn_Tyr Asp Tyr Glu «ln ,h. Arg Ala Gly Met Val Leu Leu Thr Gly 

1115 1120 
Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 

1125 i"o- 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 "« 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 

1160 1165 

His Leu lie Arg Arg His Glu Thr Cys Val He C ys Glu Cys Gly Ser 

1175 1180 

Val A«n Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 

1190 12 J 0 
He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 

1205 1210 i 215 

Thr Asp Glu Argrhr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 

1230 



1220 1225 



Arg ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp A!a 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 

i255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 

1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
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1285 1290 1295 

Val Lye Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lye Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 



Glu Thr Leu Phe Arg Leu Glu Lys 
1345 1350 

Leu His Leu His Val Glu Thr Asp 
1365 



Asp Thr Gly Ser Ser Asn Thr Val 
1355 1360 

Cys Cys Val He Pro Met He Asp 
1370 1375 



His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu Hie Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Qlu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lys Hie Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Ala Cys Pro Pro lie Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 



Arg Leu Ser Pro Ala Gly Ser Ser 
1650 1655 

Asp His Tyr Ser Cys Ser Leu Thr 
1665 1670 

Gin lie Arg Leu Arg Val Asp Pro 
1665 

Glu Val Asn Val Ser Gin Pro Lys 
1700 



Trp Asn He Asn Pro He He Val 
1660 

Tyr Leu Arg Arg Gly Ser He Lys 
1675 1680 

Gly Phe He Phe Asp Ala Leu Ala 
1690 1695 

lie Gly Ser Asn Asn He Ser Asn 
1705 1710 



Met Ser He Lys Ala Phe Arg Pro Pro Hie Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Aen Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 
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Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Tyr 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys lie Lys Gin Gin lie He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Arg Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lye Ser Gly Tyr 
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2115 2120 2125 

Leu lie Leu Asp Leu His Gin Asn lie Pha Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin lie lie Met Thr Gly Gly Leu Lye Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu lie Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

ACCAAACAAG AGAAGAAACT TGTCTGGGAA TATAAATTTA ACTTTAAATT AACTTAGGAT 60 

TAAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 12 0 

TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 180 

TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 240 

ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 30 0 

AACATGCACA AAGGG CAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 360 

AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGTCAA GTATGTCATA TACATGATTG 42 0 

AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 480 

ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 54 0 

TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 600 

CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 66 0 

TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 720 
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TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 780 

CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 840 

ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 900 

GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 960 

CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 1020 

C AAAGGG AC C ACGCGCTCCT TT CATCTGT A TCCTCAGAGA TCCTATACAT GGTGAGTTCG 1080 

CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 1140 

GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 1200 

GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 1260 

GAGTGACACA CGAATCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 1320 

AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 1380 

CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 1440 

TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 1500 

CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 1560 

ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 162 0 

AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 1680 

ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 17 40 

GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 1800 

AAACTATCAA AT CAT GGAT T CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 1860 

CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 1920 

AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 1980 

CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCCGG 2040 

GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAT ATTGATCAGG AAACTGTACA 2100 

GAGAGGACCT GGGAOAAGAA GCAGCTCAGA TAGTAOAGCT GAGACTGTGG TCTCTGGAGG 2160 

AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 2 220 

CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA ' 22 80 
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TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 2340 

TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 2400 

TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 24 60 

AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 2520 

CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 2 580 

ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 2640 

AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC T CAT CAT CG A 2700 

CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 27 60 

AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 2820 

AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 2880 

TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 2940 

ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 3000 

CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 3060 

AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GA CGAATC AC ATAGAAGATT 312 0 

GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 3180 

TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 3240 

CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 3300 

ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 3360 

CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 3420 

AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 3480 

TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 3540 

AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 3600 

CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 3660 

ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 3720 

AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 3780 

CATTCCCAGA ATCATCATTC TCTGAAAATG GT CAT AT AG A ACCATTACCA CTCAAAGTCA 3840 
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ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 3900 

ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 3 960 

ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 4020 

GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 4080 

CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 4140 

CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 4200 

TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 4260 

AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 4 3 20 

CTAAGTCAAT GGCATCACTA TCTCTACCCA ACACAAT AT C AATCAATCTG CAGGTACACA 4380 

TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 4440 

AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 4500 

ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 4560 

TAGTTGGAGG AATCAGTCTT CATGTCAATG CAACTGGGTC C AT AT CAAAA ACACTAGCAA 4620 

GTCAGCTGGT ATT CAAAA G A GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 4680 

ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 4740 

CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 4800 

AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 4860 

AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAAT AC C A ACAACTATTA GCAGTCACAC 492 0 

TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 498 0 

ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 5040 

GAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 5100 

AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 5160 

ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 5220 

TTTGAGCCTC ATACCAAAAA TAGAAGACTC TAACTCTTGT GGTGACCAAC AGATCAAGCA 52 80 

ATACAAGAAG TTATTGGATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 5340 

AGATGTGATA GTAACCAATC AAGAATCCAA TGAAAACACT GATCC CAGAA CAAAACGATT 5400 
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CTTTGGAGGG 


GTAATTGGAA 


CCATTGCTCT 


GGGAGTAGCA 


ACCTCAGCAC 


AAATTACAGC 


5460 


GGCAGTTGCT 


CTGGTTGAAG 


CCAAGCAGGC 


AAGATCAGAC 


ATCGAAAAAC 


TCAAAGAAGC 


5520 


AATTAGGGAC 


ACAAACAAAG 


CAGTGCAGTC 


AGTTCAGAGC 


TCCATAGGAA 


ATTTAATAGT 


5580 


AGCAATTAAA 


TCAGTCCAGG 


ATTATGTTAA 


CAAAGAAATC 


GTGCCATCGA 


TTGCGAGGCT 


5640 


AGGTTGTGAA 


GCAGCAGGAC 


TTCAATTAGG 


AATTGCATTA 


ACACAGCATT 


ACTCAGAATT 


5700 


AACAAACATA 


TTTGGTGATA 


ACATAGGATC 


GTTACAAGAA 


AAAGGAATAA 


AATTACAAGG 


5760 


TATAGCATCA 


TTATACCGCA 


CAAATATCAC 


AGAAATATTC 


ACAACATCAA 


CAGTTGATAA 


5820 


ATATGATATC 


TATGATCTGT 


TATTTACAGA 


ATCAATAAAG 


GTGAGAGTTA 


TAGATGTTGA 


5880 


CTTGAATGAT 


TACTCAATCA 


CCCTCCAAGT 


CAGACTCCCT 


TTATTAACTA 


GGCTGCTGAA 


5940 


CACTCAGATC 


TACAAAGTAG 


ATTCCATATC 


ATATAACATC 


CAAAACAGAG 


AATGGTATAT 


6000 


CCCTCTTCCC 


AGCCATATCA 


TGACGAAAGG 


GGCATTTCTA 


GGTGGAGCAG 


ACGTCAAAGA 


6060 


ATGTATAGAA 


GCATTCAGCA 


GCTATATATG 


CCCTTCTGAT 


CCAGGATTTG 


TATTAAACCA 


6120 


TGAAATAGAG 


AGCTGCTTAT 


CAGGAAACAT 


ATCCCAATGT 


CCAAGAACAA 


CGGTCACATC 


6X80 


AGACATTGTT 


CCAAGATATG 


CATTTGTCAA 


TGGAGGAGTG 


GTTGCAAACT 


GTATAACAAC 


6240 


CACCTGTACA 


TGCAACGGAA 


TTGGTAATAG 


AATCAATCAA 


CCACCTGATC 


AAGGAGTAAA 


6300 


AATTATAACA 


CATAAAGAAT 


GTAGTACAAT 


AGGTATCAAC 


GGAATGCTGT 


TCAATACAAA 


6360 


TAAAGAAGGA 


ACTCTTGCAT 


TCTATACACC 


AAATGATATA 


ACACTAAACA 


ATTCTGTTGC 


6420 


ACTTGATCCA 


ATTGACATAT 


CAATCGAGCT 


CAACAAGGCC 


AAATCAGATC 


TAGAAGAATC 


6480 


AAAAGAATGG 


ATAAGAAGGT 


CAAATCAAAA 


ACTAGATTCT 


ATTGGAAATT 


GGCATCAATC 


6540 


TAGCACTACA 


ATCATAATTA 


TTTTGATAAT 


GATCATTATA 


TTGTTTATAA 


TTAATATAAC 


6600 


GATAATTACA 


ATTGCAATTA 


AGTATTACAG 


AATTCAAAAG 


AGAAATCGAG 


TGGATCAAAA 


6660 


TGACAAGCCA 


TATGTACTAA 


CAAACAAATA 


ACATATCTAC 


AGATCATTAG 


ATATTAAAAT 


6720 


TATAAAAAAC 


TTAGGAGTAA AGTTACGCAA 


TCCAACTCTA 


CTCATATAAT 


TGAGGAAGGA 


6780 


CCCAATAGAC 


AAATCCAAAT 


TCGAGATGGA 


ATACTGGAAG 


CAT AC CAATC 


ACGGAAAGGA 


6840 


TGCTGGTAAT 


GAGCTGGAGA 


CGTCTATGGC 


TACTCATGGC 


AACAAGCTCA 


CTAATAAGAT 


6900 


AATATACATA 


TTATGGACAA 


TAATCCTGGT 


GTTATTATCA 


ATAGTCTTCA 


TCATAGTGCT 


6960 
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AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 7020 

GTTTATGGAA ATT AC AG AAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 7 080 

GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 7140 

ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 7200 

TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 7 2 60 

TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 7320 

AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 7 380 

AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 744 0 

AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA T AAC T GT AAA 7500 

CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 7560 

TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 7620 

CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 7 680 

TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 7740 

TCAAC CAT AT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 7800 

AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 7 860 

AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 7 920 

TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGTTGAC AAAGGCTTAA ACTCAATTCC 7 980 

AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAG GTT 8040 

ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 8100 

ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 8160 

TAATGTG CTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 8220 

ATGTATAACA GGAGTATATA CTGAT GCATA TCCACTCAAT CCCACAGGGA G CAT TGTGT C 82 80 

ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 8340 

AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 8400 

AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 8460 

TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTC CAA AAAGCTGCAG 852 0 
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TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAGTA 8580 

ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 8640 

GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 8700 

ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 87 60 

CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 8820 

TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 8880 

AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 894 0 

TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 9000 

GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 9060 

TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 9120 

AAGTT CAC AC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 9180 

TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 9240 

AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 9300 

TGATATTAGA TAAACAAAAC TATAATGGTT AT CTAAT T AC TCCTGAATTA GT AT TGATGT 93 60 

ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TG CT AAGTT A GATCCAAAAT 942 0 

TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 94 80 

TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA A C CACTTG C A TTATCCTTAA 9540 

TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 9600 

AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 9660 

TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 972 0 

CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 9780 

GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 9840 

TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 990 0 

TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 9960 

TATCATATGA AAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 10020 

T C AT AG AGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 10080 
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CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 10140 

CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 10200 

ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 10260 

CTTATAGTCT TAAAGAAAAA GAG AT CAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 10320 

ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 10380 

TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAA CAACCA 10440 

TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AG C CAT ACAG 10500 

ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 1056 0 

AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 1062 0 

TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 10680 

TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 10740 

GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 10800 

ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 10860 

TAGAAGGATT TTGTCAAAAA TT AT GGACAC T CAT AT CTAT AAG TGCAATA C AT CT AG CAG 10920 

CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 10980 

TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 11040 

ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 11100 

AATTAAATGA AACGATTATA AGTAGCAAGA TGT T CAT AT A TAGCAAAAGA ATCTATTATG 11160 

ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 11220 

CAGTAATAGA CGAAACAAGA T CAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 11280 

TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 11340 

AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 11400 

AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 11460 

GATTCAATTA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 11520 

CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 11580 

ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT GGACTGGGCT TCAGATCCAT 11640 
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ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 11700 

GGAATGTATT ACAAGATTCA CCAAAT CCAT TATTATCTGG ATTATTCACA AATACAATGA 11760 

TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 11820 

TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 11880 

TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 11940 

TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 12 000 

GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 12060 

CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 1212 0 

CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 12180 

TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 12240 

AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 12300 

CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 12 360 

CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 12420 

TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 124 80 

TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 12540 

TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT CATAACAATG TCCAATGATA 12 60 0 

ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 12660 

TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 12720 

ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TAT T AAAGAA AGTTTTAATG 127 80 

ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 12840 

TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 12900 

ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 12960 

CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATT AG AT CG A GATAATTTAA 1302 0 

AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 13080 

CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 13140 

ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 13200 
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CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATACTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGAT GAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAAT GATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGT CT AG AATACTTTAT CTATATAAAT 
TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 
TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA AC C TAGTGAA ATT GTTTTAT 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 
CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 



13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13B0O 
13860 
13920 
13960 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
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GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG 
ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA 
GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT 
AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT 
GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT 
TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC 
TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA 
GAGAG TGT AT AGGATCAATA TCATATTGGT TTCTAACCAA 
AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA 
AGTTATTAGA AAACTACAAT CAACATGAT G AATTTGATAT 
TGAAGATATA TCCTAACCTT TAT CTTTAAG CCTAGGAATA 
TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG 
(2) INFORMATION FOR SSQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 223 3 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



ATTTCAAATC 
TAT C AACAAT 
TAATATAACT 
GAAAAATAAG 
ATCATTATCG 
ACAGACTGGA 
AAACAT CAT T 
AGAAGTTAAA 
ATATAAAGAA 
CGATTAAAAC 
GACAAAAAGT 
GT 



AAT TTAAATC 
ATAATCCAAA 
CATGATGATA 
GGAAAGTTAA 
ACTCGATTAC 
TATGTATCAT 
AAGAATTACA 
ATACTTATGA 
CCCGAAGACC 
ATAAATACAA 
AAGAAAAACA 



14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15462 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp lie Leu Tyr Pro 

15 10 15 

Glu Cye His Leu Asn Ser Pro lie Val Lys Gly Lys lie Ala Gin Leu 
20 25 30 

His Thr lie Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

lie Leu Val lie Thr Arg Gin Lys lie Lys Leu Asn Lys Leu Asp Lys 

50 55 60 
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Arg Gin Arg Ser lie Arg Arg Leu Lys Leu lie Leu Thr Glu Lys Val 
65 7 0 7 5 80 

Asn Asp Leu Gly Lys Tyr Thr Phe lie Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr lie Pro Gly lie Asn Ser Lys Val Thr Glu 
100 105 no 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp lie Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu lie Asn Asn lie Ser Lys Val 
3-45 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr lie Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
180 185 190 

lie Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu lie His Pro Glu Leu Val Leu lie Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu lie Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lye Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys He Leu Asp He Phe Asn Lys Ser Thr Tie Asp 
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340 345 350 

Glu lie Ala Glu Ho Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser lie Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr lie 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr lie Asn Lys Cys His Ala lie Phe 
385 390 395 400 

Cys Thr lie He He Asn Gly Tyr Arg. Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He lie Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
580 585 590 

Leu Lye Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lye Thr Tyr 
610 615 620 
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Asn Lys lie Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lye Lye 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp lie Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin lie Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

lie Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His lie 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly lie Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu lie Ser lie 
740 745 750 

Ser Ala lie His Leu Ala Ala Val Arg lie Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala lie Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu lie Val Tyr Lye Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr lie lie Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg lie Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 
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He Gin Gin Leu Tyr He Ala Leu Gly Met Asn lie Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn Tyr Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lye Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly lie Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 

Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 

Val Val He Thr Gly Ser Glu His Cys Lys lie Cys Tyr Ser Ser Asp 
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1170 1175 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Aan lie Lys lie 
1185 1190 1195 1200 

Gly Ser Ala Glu Thr Gly lie Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr lie Lys Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lye Ala Ala lie Arg lie Ala Met lie Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu lie Ser Trp Met Glu Ala Ser Gin lie 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lys Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lye Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 
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Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Aan Ser 
1460 "65 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp lie Leu Val Phe Leu Lys Thr 
1475 "80 1485 

lW lY Sln Phe Ala ^ Thr ryr ser Leu 

1490 14 *5 1500 

Lys He Glu Qly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
5 1510 "IS 1520 

Arg Asp Thr Ser His Ser He Leu Lys Val Leu Ser Aen Ala Leu Ser 
1525 1530 1S35 

His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Qly Val 



1540 



1545 



He Tyr Gly Pro Asn Thr Ala Ser Gin Asp Gin He Lys Leu Ala 



1555 1560 



Leu Asn Pro 
1550 

Leu 



1565 



Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 

1575 1580 

Gly Val Ser Leu Glu lie Tyr He Cys Asp Ser Asp Met Glu Val Ala 



1585 1590 1595 



1600 



Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cye 

1610 1615 CY 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
"20 1625 163Q 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 



1635 1640 



1645 



LYB ?«n ABP L6U LyS Val Gln Ile s « *™ Leu He 

1650 1655 1660 

Lys ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 J 80 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val n e Asp Asp Trp 
1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 



1700 1705 



1710 



Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys He Asn Asn Phe 
"15 1720 1725 
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Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys lie Arg Ser lie 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser Hie Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly lie Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

lie Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn lie Thr Asp Val He Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lye He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
1845 1850 1855 

Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp lie Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 

Leu He Trp Ser Glu Leu Asn Asp Lys Ser He Gly Leu Val His Cys 
1890 1895 1900 

Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 1920 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 
1970 1975 1980 

Asp Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 1990 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 
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2005 2010 2015 

lie Leu Ser Lys Lys Arg Aen Asn Glu Trp Leu Hie His Glu lie Lys 
2020 2025 2030 

Glu Qly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lye Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 

Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19; 
ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT AACTTAGGAT 
TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 
TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 
TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 
ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 
AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 
AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATG CCAA GTATGTCATA TACATGATTG 
AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 
ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT AT GGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GAT AT CAATA GATTAAAAGC TTTGATGGAA CTGTATT TAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAAGCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAG C CAT AG A GATGGCAATA GATGAAGAGC 



60 
X20 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA AT CAT GG ATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 
GT CAT CACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA 
GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 
TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CAT CAG AAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 



1440 
1500 
1560 
1620 
16B0 
1740 
1800 
I860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
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ACGAGTTGTA 


TGTGTAGCAA 


ATGTACTAAA 


CAATGTAGAT 


ACTGCATCAA AGATAGATTT 


3000 


CCTGGCAGGA 


TTAGTCATAG 


GGGTTTCAAT 


GGACAACGAC 


ACAAAATTAA 


CACAGATACA 


3060 


AAATGAAATG 


CTAAACCTCA 


AAGCAGATCT 


AAAGAAAATG 


GACGAATCAC 


ATAGAAGATT 


3120 


GATAGAAAAT 


CAAAGAGAAC 


AACTGTCATT 


GATCACGTCA 


CTAATTTCAA 


ATCTCAAAAT 


3180 


TATGACTGAG 


AGAGGAGGAA 


AGAAAGACCA 


AAATGAATCC 


AATGAGAGAG 


TATCCATGAT 


3240 


CAAAACAAAA 


TTGAAAGAAG 


AAAAGATCAA 


GAAGACCAGG 


TTTGACCCAC 


TTATGGAGGC 


3300 


ACAAGGCATT 


GACAAGAATA 


TACCCGATCT 


ATATCGACAT 


GCAGGAGATA 


CACTAGAGAA 


3360 


CGATGTACAA 


GTTAAATCAG 


AGATATTAAG 


TT CAT ACAAT 


GAGTCAAATG 


CAACAAGACT 


3420 


AATACCCAAA 


AAAGTGAGCA 


GTACAATGAG 


ATCACTAGTT 


GCAGTCATCA 


ACAACAGCAA 


3480 


TCTCTCACAA 


AG C ACAAAAC 


AATCATACAT 


AAACGAACTC 


AAACGTTGCA 


AAAATGATGA 


3540 


AGAAGTATCT 


GAATTAATGG 


ACATGTTCAA 


TGAAGATGTC 


AACAATTGCC 


AATGATCCAA 


3600 


CAAAGAAACG 


ACACCGAACA 


AACAGACAAG 


AAACAACAGT 


AGATCAAAAC 


CTGTCAACAC 


3660 


ACACAAAATC 


AAGCAGAATG 


AAACAACAGA 


TATCAATCAA 


TATACAAATA 


AGAAAAACTT 


3720 


AGGATTAAAG 


AATAAATTAA 


TCCTTGTCCA 


AAATGAGTAT 


AACTAACTCT 


GCAATATACA 


3780 


CATTCCCAGA 


ATCATCATTC 


TCTGAAAATG 


GTCATATAGA 


ACCATTACCA 


CTCAAAGTCA 


3840 


ATGAACAGAG 


GAAAGCAGTA 


CCCCACATTA 


GAGTTGCCAA 


GATCGGAAAT 


CCACCAAAAC 


3900 


ACGGATCCCG 


GTATTTAGAT 


GTCTTCTTAC 


TCGGCTTCTT 


CGAGATGGAA 


CGAATCAAAG 


3960 


ACAAATACGG 


GAGTGTGAAT 


GATCTCGACA 


GTGACCCGAG 


TTACAAAGTT 


TGTGGCTCTG 


4020 


GATCATTACC 


AATCGGATTG 


GCTAAGTACA 


CTGGGAATGA 


CCAGGAATTG 


TTACAAGCCG 


4080 


CAACCAAACT 


GGATATAGAA 


GTGAGAAGAA 


CAGTCAAAGC 


GAAAGAGATG 


GTTGTTTACA 


4140 


CGGTACAAAA 


TATAAAACCA 


GAACTGTACC 


CATGGTCCAA 


TAGACTAAGA 


AAAGGAATGC 


4200 


TGTTCGATGC 


CAACAAAGTT 


GCTCTTGCTC 


CTCAATGTCT TCCACTAGAT AGGAGCATAA 


4260 


AATTTAGAGT 


AATCTTCGTG 


AATTGTACGG 


CAATTGGATC 


AATAACCTTG 


TTCAAAATTC 


4320 


CTAAGTCAAT 


GGCATCACTA 


TCTCTAACCA 


ACACAATATC 


AATCAATCTG 


CAGGTACACA 


4380 


TAAAAACAGG 


GGTTCAGACT 


GATTCTAAAG 


GGATAGTTCA 


AATTTTGGAT 


GAGAAAGGCG 


4440 


AAAAATCACT 


GAATTTCATG 


GTCCATCTCG 


GATTGATCAA 


AAGAAAAGTA 


GGCAGAATGT 


4500 
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ACTCTGTTGA 


ATACTGTAAA 


CAGAAAATCG 


AGAAAATGAG 


ATTGATATTT 


TCTTTAGGAC 


4560 


TAGTTGGAGG 


AATCAGTCTT 


CATGTCAATG 


CAACTGGGTC 


CATATCAAAA 


ACACTAGCAA 


4620 


GTCAGCTGGT 


ATTCAAAAGA 


GAGATTTGTT 


ATCCTTTAAT 


GGATCTAAAT 


CCGCATCTCA 


4680 


ATCTAGTTAT 


CTGGGCTTCA 


TCAGTAGAGA 


TTACAAGAGT 


GGATGCAATT 


TTCCAACCTT 


4740 


CTTTACCTGG 


CGAGTTCAGA 


TACTATCCTA 


ATATTATTGC 


AAAAGGAGTT 


GGGAAAATCA 


4800 


AACAATGGAA 


CTAGTAATCT 


CTATTTTAGT 


CCGGACGTAT 


CTATTAAGCC 


GAAGCAAATA 


4860 


AAGGATAATC 


AAAAACTTAG 


GACAAAAGAG 


GTCAATACCA 


ACAACTATTA 


GCAGTCACAC 


4920 


TCGCAAGAAT 


AAGAGAGAAG 


GGACCAAAAA 


AGTCAAATAG 


GAGAAATCAA 


AACAAAAGGT 


4980 


ACAGAACACC 


AGAACAACAA 


AATCAAAACA 


TCCAACTCAC 


TCAAAACAAA 


AATTCCAAAA 


5040 


GAG AC CGGCA 


ACACAACAAG 


CACTGAACAC 


AATGCCAACT 


TCAATACTGC 


TAATTATTAC 


5100 


AACCATGATC 


ATGGCATCTT 


TCTGCCAAAT 


AGATATCACA 


AAACTACAGC 


ACGTAGGTGT 


5160 


ATTGGTCAAC 


AGTCCCAAAG 


GGATGAAGAT 


ATCACAAAAC 


TTTGAAACAA 


GATATCTAAT 


5220 


TTTGAGCCTC 


ATACCAAAAA 


TAGAAGACTC 


TAACTCTTGT 


GGTGACCAAC 


AGATCAAGCA 


5280 


ATACAAGAAG 


TTATTGGATA 


GACTGATCAT 


CCCTTTATAT 


GATGGATTAA 


GATTACAGAA 


5340 


AGATGTGATA 


GTAACCAATC 


AAGAATCCAA 


TGAAAACACT 


GATCCCAGAA 


CAAAACGATT 


5400 


CTTTGGAGGG 


GTAATTGGAA 


CCATTGCTCT 


GGGAGTAGCA 


ACCTCAGCAC 


AAATTACAGC 


5460 


GGCAGTTGCT 


CTGGTTGAAG 


CCAAGCAGGC 


AAGATCAGAC 


ATCGAAAAAC 


TCAAAGAAGC 


5520 


AATTAGGGAC 


ACAAATAAAG 


CAGTGCAGTC 


AGTTCAGAGC 


TCCATAGGAA 


AT TTAAT AGT 


5580 


AGCAATTAAA 


TCAGTCCAGG 


ATTATGTTAA 


CAAAGAAATC 


GTGCCATCGA 


TTGCGAGGCT 


5640 


AGGTTGTGAA 


GCAGCAGGAC 


TTCAATTAGG 


AATTGCATTA 


ACACAGCATT 


ACTCAGAATT 


5700 


AACAAACATA 


TTTGGTGATA 


ACATAGGATC 


GTTACAAGAA 


AAAGGAATAA 


AATTACAAGG 


5760 


TATAGCATCA 


TTATACCGCA 


CAAATATCAC 


AGAAATATTC 


ACAACATCAA 


CAGTTGATAA 


5820 


ATATGATATC 


TATGATCTGT 


TATTTACAGA 


ATCAATAAAG 


GTGAGAGTTA 


TAGATGTTGA 


5880 


CTTGAATGAT 


TACTCAATCA 


CCCTCCAAGT 


CAGACTCCCT 


TTATTAACTA 


GGCTGCTGAA 


5940 


CACTCAGATC 


TACAAAGTAG 


ATTCCATATC 


ATATAACATC 


CAAAACAGAG 


AATGGTATAT 


6000 


CCCTCTTCCC 


AGCCAT AT C A 


TGACGAAAGG 


GGCATTTCTA 


GGTGGAGCAG 


ACGTCAAAGA 


6060 
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ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT GCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT AT C C CAATGT CCAAGAACAA CGGT CACATC 
AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAGT AGGTATCAAC GGAATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTAC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACT AG ATT CT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 
TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CT CAT AT AAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 
AATTAATTCC AT CAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AAT ACAAGG C TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TG AT AAT CAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCT CTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 



6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 

6780 
6840 
6900 
6960 
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7200 
7260 
7320 
7380 
7440 
7500 
7560 
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CAAAGTTGAT OAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 7680 

TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 7740 

TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 7800 

AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA T CTGCAACAC 7860 

AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 7 920 

TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 7980 

AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 8040 

ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 8100 

ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 8160 

TAATGTGCTA T CAAGACC AG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 8220 

ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 8280 

ATCTGTCATA TTAGACTCAC AAAAAT C GAG AGTGAACCCA GTCATAACTT ACTCAACAGC 8340 

AACCGAAAGA GTAAACOAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 8400 

AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 8460 

TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 8520 

TTAATCATAA TTAACCATAA TATGCATCAA T CTATCTAT A ATACAAGTAT ATGATAAGTA 8580 

ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 8640 

GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 8700 

ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 87 60 

CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 8820 

TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 8880 

AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 8940 

TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 9000 

GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 9060 

TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 9120 

AAGTT CACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 9180 
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TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA : GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA AC CACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 
CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 
TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 
TATCATATGA GAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 
TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 
CAAAAAAATC AAATTGGGAC ACAGTT TAT C CTGCATCTAA TTTACTGTAC CGTACTAACG 
CAT C C AACG A ATCACGAAGA TTAGTTGAAG TATTTATAG C AGATAGTAAA TTTGATCCTC 
ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 
CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 
ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 
TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 
TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAAtTAATTCTAAA AGCCATACAG 
ATGACCTTAA AACCTACAAT AAAATAAGTA AT CTT AATTT GTCTTCTAAT CAGAAATCAA 
AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 
TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 
TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 
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GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 10800 

ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 10860 

TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 10920 

CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 10980 

TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 11040 

ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATCfGA TGATCTAGGT CATGAACTTA 11100 

AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 11160 

ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 11220 

CAGTAATAGA CGAAACAAGA XCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 11280 

TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 11340 

AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 11400 

AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 11460 

GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT C CAT CAGTTG 11520 

CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 11580 

ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAGATCCAT 11640 

ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 11700 

GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 117 60 

TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 11820 

TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 11880 

TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 11940 

TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 12000 

GACTAATTGT AAGTGATAAA AT CAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 12060 

CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 12120 

CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 12180 

TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 12240 

AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 12 300 
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CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 12360 

CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG AT AT CTTGGA 12420 

TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 12480 

TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 12540 

TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 12600 

ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 12660 

TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 1272 0 

ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 12780 

ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 12 840 

TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 12900 

ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 12960 

CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAG AT CGA GATAATTTAA 13 020 

AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 13080 

CT CTTG ACAT ACTTGTATTT CT CAAGA CAT TTGGTGGATT ATTAG TAAAT CAAT TTG CAT 13140 

ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 13200 

CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCAT T A TCTCATCCTA 132 60 

AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATATTG 13320 

CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 13380 

TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 13440 

TTGCAAATGA TAG GAAAC AA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 13 500 

CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 13560 

TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACT CTT AAA TATGTACAAA 13620 

TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 13680 

TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 13740 

TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 13 800 

ATAATAAAGG GAATAAAATT AACAATT T CT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 13 860 
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TTAAAATCAG 


ATCTATAACA 


AGTGATTCTG 


ATGATAATGA 


TAGACTAGAT 


GCTAATACAA 


13920 


GTGGTTTGAC ACTTCCTCAA GGAGGGAATT 


ATCTATCGCA 


T CAATTGAGA 


TTATTCGGAA 


13980 


TCAACAGCAC 


TAGTTGTCTG 


AAAGCTCTTG 


AGTTATCACA 


AATTTTAATG 


AAGGAAGTCA 


14040 


ATAAAGACAA 


GGACAGGCTC 


TTCCTGGGAG 


AAGGAGCAGG 


AGCTATGCTA 


GCATGTTATG 


14100 


ATGCCACATT 


AGGACCTGCA 


GTTAATTATT 


ATAATTCAGG 


TTTGAATATA 


ACAGATGTAA 


14160 


TTGGTCAACG 


AGAATTGAAA 


ATATTTCCTT 


CAGAGGTATC 


ATTAGTAGGT 


AAAAAATTAG 


14220 


GAAATGTGAC 


ACAGATTCTT 


AACAGGGTAA 


AAGTACTGTT 


CAATGGGAAT 


CCTAATTCAA 


14280 


CATGGATAGG 


AAATATGGAA 


TGTGAGAGCT 


TAATATGGAG 


TGAATTAAAT 


GATAAGTCCA 


14340 


TTGGATTAGT 


ACATTGTGAT 


ATGGAAGGAG 


CTATCGGTAA 


ATCAGAAGAA 


ACTGTTCTAC 


14400 


ATGAACATTA 


TAGTGTTATA 


AGAATTACAT 


ACTTGATTGG 


GGATGAT GAT 


GTTGTTTTAG 


14460 


TTTCCAAAAT 


TATACCTACA 


ATCACTCCGA 


ATTGGTCTAG 


AATACTTTAT 


CTATATAAAT 


14520 


TATATTGGAA 


AGATGTAAGT 


ATAATATCAC 


TCAAAACTTC 


TAATCCTGCA 


TCAACAGAAT 


14580 


TATATCTAAT 


TTCGAAAGAT 


GCATATTGTA 


CTATAATGGA 


ACCTAGTGAA 


ATTGTTTTAT 


14640 


CAAAACTTAA 


AAGATTGTCA 


CTCTTGGAAG 


AAAATAATCT 


ATTAAAATGG 


ATCATTTTAT 


14700 


CAAAGAAGAG 


GAATAATGAA 


TGGTTACATC 


ATGAAATCAA 


AGAAGGAGAA 


AGAGATTATG 


14760 


GAATCATGAG 


ACCATATCAT 


ATGGCACTAC 


AAAT C TTTGG 


ATTTCAAATC 


AATTTAAATC 


14820 


ATCTGGCGAA 


AGAATTTTTA 


TCAACCCCAG 


ATCTGACTAA 


TATCAACAAT 


ATAATCCAAA 


14880 


GTTTTCAGCG 


AACAATAAAG 


GATGTTTTAT 


TTGAATGGAT 


TAATATAACT 


CATGATGATA 


14940 


AGAGACATAA ATTAGGCGGA AGATATAACA 


TATTCCCACT 


GAAAAATAAG 


GGAAAGTTAA 


15000 


GACTGCTATC 


GAGAAGACTA 


GTATTAAGTT 


GGATTTCATT 


AT CAT TATCG 


ACTCGATTAC 


15060 


TTACAGGTCG 


CTTTCCTGAT 


GAAAAATTTG 


AACATAGAGC 


ACAGACTGGA 


TATGTATCAT 


15120 


TAGCTGATAC 


TGATTTAGAA 


TCATTAAAGT 


TATTGTCGAA 


AAACATCATT 


AAGAATTACA 


15180 


GAGAGTGTAT 


AGGATCAATA 


TCATATTGGT 


TTCTAACCAA 


AGAAGTTAAA 


ATACTTATGA 


15240 


AATTGATTGG 


TGGTGCTAAA 


TTATTAGGAA 


TTCCCAGACA 


ATATAAAGAA 


CCCGAAGACC 


15300 


AGTTATTAGA 


AAACTACAAT 


CAACATGATG 


AATTTGATAT 


CGATTAAAAC 


ATAAATACAA 


15360 


TGAAGATATA 


TCCTAACCTT 


TATCTTTAAG 


CCTAGGAATA GACAAAAAGT AAGAAAAACA 


15420 
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TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG-GT 15462 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRAND SDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp He Leu Tyr Pro 
15 10 15 

Glu Cys His Leu Asn Ser Pro He Val Lys Gly Lys He Ala Gin Leu 
20 25 30 

His Thr He Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

He Leu Val He Thr Arg Gin Lys lie Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Gin Arg Ser He Arg Arg Leu Lys Leu He Leu Thr Glu Lys Val 
*5 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Glu Met Ser Lye 
85 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Glu 
100 105 HO 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu He Asn Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
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180 185 190 

He Thr Phe Asn Val Gly Lye Asp Tyr Asn Lou Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lye He Leu Asp He Phe Asn Lys Ser Thr He Asp 
340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp Hie Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 
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Asp Glu Asp Leu Thr lie Tyr Met Lys Asp Lys Ala Leu Ser Pro Lye 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn lie Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
560 585 590 

Leu Lys Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 

Asn Lys He Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin He Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 
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Gly Gly lie Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu lie Ser lie 
740 745 750 

Ser Ala lie His Leu Ala Ala Val Arg lie Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala lie Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
7B5 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe lie Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cye Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lye Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Aan Gin Glu Pro Gly Glu Ser Ser Phe Phe 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
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1010 1015 1020 

Ser Pro Aan Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Mat He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1060 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 

Ala lie Ala Leu Arg Gin Lys Met Trp He Hie Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 

Val Val He Thr Gly Ser Glu His Cys Lys He Cys Tyr Ser ser Asp 
1170 1175 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn He Lys He 
1185 1190 1195 1200 

Gly Ser Ala Glu Thr Gly He Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr He Lys Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lys Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 
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Thr Gin Met Lys Phe Ser Ser Thr Ser Leu lie Arg Val Ser Arg Phe 
1300 1305 1310 

lie Thr Met Ser Asn Aep Asn Met Ser lie Lye Glu Ala Ann Glu Thr 
1315 1320 1325 

Lye Asp Thr Asn Leu lie Tyr Gin Gin lie Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lye Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

lie Val Met His Leu His lie Glu Asp Glu Cys Cys lie Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He lie His Ala He Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 1500 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
1505 1510 1515 1520 

Arg ABp Thr Ser His Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

Hie Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 1550 

He Tyr Gly Pro Asn He Ala Ser Gin Aep Gin He Lys Leu Ala Leu 
1555 1560 1565 



BNSD0CID:<WO 9813501A2> 



SUBSTITUTE SHEET (RULE 26) 



1 :p ) 

WO 98/13501 PCT/US97/16718 



- 271 - 



Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 

Gly Val Ser Leu Glu He Tyr He Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 1630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 
1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys He Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys He Arg Ser He 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly He Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn He Thr Asp Val He Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
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1845 1850 1855 

Lye Lou Gly Asn Val Thr Gin lie Leu Asn Arg Val Lye Val Leu Phe 
I860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp lie Gly Asn Met Glu Cys Glu Ser 
1875 1860 1885 

Leu lie Trp Ser Glu Leu Asn Asp Lys Ser lie Gly Leu Val His Cys 
1890 1895 1900 

Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 1920 

His Tyr Ser Val lie Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr lie Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 
1970 1975 1980 

Asp Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 1990 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 
2005 2010 2015 

He Leu Ser Lys Lys Arg Asn Asn Glu Trp Leu His His Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 



Leu Ser Thr Pro Asp Leu Thr Asn 
2065 2070 

Gin Arg Thr He Lys Asp Val Leu 
2085 

Asp Asp Lys Arg His Lys Leu Gly 
2100 



He Asn Asn He He Gin Ser Phe 
2075 2080 

Phe Glu Trp He Asn He Thr His 
2090 2095 

Gly Arg Tyr Asn He Phe Pro Leu 

2105 2110 



Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 
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Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lye Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

AC CAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT AACTTAGGAT 60 

TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 120 

TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 180 

TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 240 

ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 300 

AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 360 

AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGCCAA GTATGTCATA TACATGATTG 420 

AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 480 
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ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGT ATT TAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCT CAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG AGGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA C QAAGCT AAA GAAAG CTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 
CAGAACAATT C G AAC AT AG A GCAGATCAAG AACAAAATGG AGAACCTCAA T C AT C CAT AA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAAT CTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACT CAGTG CC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 

1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
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GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA 210 0 

GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 2160 

AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 222 0 

CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 22 80 

TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 234 0 

TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 2400 

TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 2460 

AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAG GGGGAA AAGGGAAAGA 252 0 

CTGGTTTAAG AAATCAAAAG ATAC CGACAA CCAGATACCA AC AT CAG ACT ACAGATCCAC 2580 

ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 2640 

AACAGAAATA CAGACAGAAT CAT CAG AAA C ACAATCCTCA TCATGGAATC TCATCATCGA 27 00 

CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 2760 

AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 2820 

AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 2 680 

TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT AT CAAG ACAA 294 0 

ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 3000 

CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 306 0 

AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 312 0 

GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 3180 

TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 324 0 

CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 3300 

ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT^GCAGGAGATA CACTAGAGAA 3 360 

CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 3420 

AATACCCAAA AAAG T GAG CA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 3480 

TCTCTCACAA AG C ACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 3 54 0 

AG AAGT AT C T GAATTAATGG ACAT GTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 3 600 
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CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 3660 

ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 3720 

AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 3780 

CATTCCCAGA ATCATCATTC TCTGAAAATG GTCATATAGA ACCATTACCA CTCAAAGTCA 3840 

ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 3900 

ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 3 960 

ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 4020 

GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 4080 

CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC G AAAGAG AT G GTTGTTTACA 4140 

CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 4200 

TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 4260 

AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTG 4320 

CTAAGTCAAT GGCATCACTA TCTCTAACCA ACACAATATC AATCAATCTG CAGGTACACA 4380 

TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 444 0 

AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 4500 

ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 4560 

TAGTTGGAGG AATCAGTCTT CATGTCAATG CAACTGGGTC CAT AT CAAAA ACACTAGCAA 4 620 

GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 4680 

ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATG CAATT TTCCAACCTT 474 0 

CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 4800 

AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 4 860 

AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 4920 

TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 4960 

ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC T CAAAA C AAA AATTCCAAAA 5040 

GAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 5100 

AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 5160 
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ATTGGTCAAC 


AGTCCCAAAG 


GGATGAAGAT 


ATCACAAAAC 


■~ TTTGAAACAA 

a a a vimA^Afl 


fJATATPTi AT 




TTTGAGCCTC 


ATACCAAAAA 


TAGAAGACTC 


TAACTCTTGT 


GGTGACCAAP 


AG A TP A A fir* A 

AUA 1 V— XU\l?l— A 


coon 


ATACAAGAAG 


TTATTGGATA 


GACTGATCAT 


CCCTTTATAT 


GATGGATTAA 


GATTAPAGA A 




AGATGT GAT A 


GTAACCAATC 


AAGAATC CAA 


TGAAAACACT 


G AT CC CAGAA 


PA A A APGATT 


J7UU 


CTTTGGAGGG 


GTAATTGGAA 


CCATTGCTCT 


GGGAGTAGCA 


ACCTPAGPAP 


A A ATT A P AGP 




GGCAGTTGCT 


CTGGTTGAAG 


CCAAGCAGGC 


AAGATPAGAP 


A ■*- LurAAAAAL 


1 LAAAbAAtjC 


C n 
DD JO 


AATTAGGGAC 


ACAAATAAAG 


CAGTGCAGTC 


AGTTCAGAGC 


TC* P A T A an A A 


AX 1 iAAIAu 1 


c c d a 


AG CAAT T AAA 


TCAGTCCAGG 


ATTATGTTAA 


CAAAGAAATC 


(5TG fP AT PG A 


A 1 l*t^l»At»ljL, 1 


c ir ^ a 


AGGTTGTGAA 


GCAGCAGGAC 


TTCAATTAGG 


AATT GCAT T A 


ACACAGCATT 


APTPAP A ATT 
AL X LAbAA X X 


D / UU 


AACAAACATA 


TTTGGTGATA 


ACATAGGATC 


GTTACAAGAA 


A A AGP A AT AH 


AA1 1 AL-AACjIj 


C«Tf ft 

5/oU 


TATAGCATCA 


TTATACCGCA 


CAAATATCAC 


AGAAATATTP 


1PMP1TP1 a 
ALAAWAitAA 


LAG I TvvATAA 


C O "1 A 

5 820 


ATATGATATC 


TATGATCTGT 


TATTTACAGA 


AT CAAT AAA G 


GTGAGAGTTA 


TidTflTTflU 
X a\ja X L» X X un 


coon 


CTTGAATGAT 


TACTCAATCA 


CCCTCCAAGT 


CAGACTCCCT 


TTATTAACTA 

A A *»■ A A A 


GG CT G PT GA A 




CAC TCAGATC 


TACAAAGTAG 


ATTCCATATC 


ATATAACATC 


CAAAACAGAG 


AATGGTATAT 


£ nrin 
ouuu 


CCCTCTTCCC 


AGCCATATCA 


TGACGAAAGG 


GGCATTTCTA 




aLVj X UaAAuA 


OUoU 


ATGTATAGAA 


GCATTCAGCA 


GCTATATATG 


CCCTTCTGAT 


CCAGGATTTG 


TATTAAAPPA 


CI Ort 

O X ^ u 


TGAAATAGAG 


AGCTGCTTAT 


CAGGAAACAT 


ATCCCAATGT 


CCAAGAACAA 


CGGTCACATC 


6180 


AGACATTGTT 


CCAAGATATG 


CATTTGTCAA 


TGGAGGAGTG 


GTTGCAAACT 


GTATAACAA P. 


6240 


CACCTGTACA 


TGCAACGGAA 


TTGGTAATAG 


AATCAATCAA 


CCACCTGATC 


AAGGAGTAAA 


6300 


AATTATAACA 


CATAAAGAAT 


GTAGTACAGT 


AGGTATCAAC 


GGAATGCTGT 


TCAATACAAA 


6360 


TAAAGAAGGA 


ACTCTTGCAT 


TCTATACACC 


AAATGATATA 


ACACTAAACA 


ATTCTGTTAC 


6420 


ACTTGATCCA 


ATTGACATAT 


CAATCGAGCT 


CAACAAGGCC 


AAATCAGATC 


TAGAAGAATC 


6480 


AAAAGAATGG 


ATAAGAAGGT 


CAAATCAAAA 


ACTAGATTCT 


ATTGGAAATT 


GGCATCAATC 


6540 


TAGCACTACA 


ATCATAATTA 


TTTTGATAAT 


GATCATTATA 


TTGTT TATAA 


TTAATATAAC 


6600 


GATAATTACA 


ATTGCAATTA 


AGTATTACAG 


AATTCAAAAG 


AGAAATCGAG 


TGGATCAAAA 


6660 


TGACAAGCCA 


TATGTACTAA 


CAAACAAATA 


ACATATCTAC 


AGATCATTAG 


ATATTAAAAT 


6720 
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TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA AT AGTCTT CA TCATAGTGCT 
AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG AT CTAATAC A 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCC AG AT GAT TTTTGGAGAT GCACGT CTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC T AATT ACT CG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAAC CATAT GCTGCACTAT ACCCATCTGT TGGAC CAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 



6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 : 

7680 

7740 

7800 

7860 
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7980 
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ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 8340 

AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 8400 

AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 8460 

TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATT CCAA AAAGCTGCAG 8520 

TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAGTA 8580 

ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 864 0 

GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 8700 

ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 87 60 

CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT C AC T AGACAG AAAATAAAAC 8820 

TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 8880 

AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 894 0 

TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 9000 

GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 90 60 

TAGCCTCAAA AAATGATGGA AGCAAT TATG ATCTTAATGA AGAAATTAAT AATATATCGA 9120 

AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 9180 

TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 9240 

AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 930 0 

TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 93 60 

ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 9420 

TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 94 80 

TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 9540 

TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 9600 

AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTT CTGAGT GTAGATTACA 9660 

TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 9720 

CTTTTTTTAG AACATT TGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 9780 

GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 984 0 
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TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 9900 

TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 9960 

TATCATATGA GAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 10020 

TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 10080 

CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 10140 

CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 10200 

ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 10260 

CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 10320 

ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 10380 

TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 10440 

TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTC TAAA AGCCATACAG 10500 

ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 10560 

AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 10620 

TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 10680 

TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 10740 

GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 10800 

ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 10860 

TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 10920 

CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 10980 

TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 11040 

ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 11100 

AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 11160 

ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 11220 

CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACAT CATTT GCAAAAGCAA 11280 

TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 11340 

AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 11400 
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AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 114 60 

GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 11520 

CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 11580 

ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAG ATC CAT 11640 

ATT CATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 11700 

GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 11760 

TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 11820 

TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 11880 

TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 11940 

TGTTGAGGAA AAT CAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 12 000 

GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 12060 

CATTGCGACA AAAGATGTGG ATT CATTT AT CAGGAGGAAG GATGATAAGT GGACTTGAAA 12120 

CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 12180 

TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 12240 

AAATAGGATC AG C AG AAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 12300 

CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 12360 

CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 1242 0 

TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 12480 

TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT TAAGGATACT GCAACTCAGA 1254 0 

TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 12600 

ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 12660 

TGTTAACAGG AT TAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 12720 

ACCCTATAGT TATGCAT CTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 12780 

ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 12840 

TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 12900 

ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 12 960 
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CAATATGTAC TGCAATTACA ATAGCA6ATA CTATGTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG G ATT GT GG AG TTTTAAACCC TATTTATGGT CCTAATATTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAG G AAAC AA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AAT TGATG AT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG AT G AT AATG A TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG T TTGAATAT A ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT *AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTtCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCT A CA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 



13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
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TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 14580 

TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 14640 

CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 14700 

CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 14760 

GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC 14820 

ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA TATCAACAAT ATAATCCAAA 14 880 

GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT TAATATAACT CATGATGATA 14 940 

AGAGACATAA ATT AGGCGG A AG AT AT AAC A TATTCCCACT GAAAAATAAG GGAAAGTTAA 15000 

GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT ATCATTATCG ACTCGATTAC 15060 

TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC ACAGACTGGA TATGTATCAT 15120 

TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA AAACATCATT AAGAATTACA 15180 

GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA AGAAGTTAAA ATACTTATGA 15240 

AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA ATATAAAGAA CCCGAAGACC 15300 

AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT CGATTAAAAC ATAAATACAA 15360 

TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA GACAAAAAGT AAGAAAAACA 15420 

TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG GT 15462 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Asp Thx Glu Ser Aan Aen Gly Thr Val Ser Asp lie Leu Tyr Pro 

15 10 15 

Glu Cys His Leu Aen Ser Pro lie Val Lys Gly Lys lie Ala Gin Leu 
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20 25 30 

Hie Thr lie Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

lie Leu Val He Thr Arg Gin Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Gin Arg Ser He Arg Arg Leu Lys Leu He Leu Thr Glu Lys Val 
65 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Glu 
100 105 110 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Aen Glu Glu He Aen Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
180 185 190 

He Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cya Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Aen Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lye Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 
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Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser lie Lye Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys He Leu Asp He Phe Asn Lys Ser Thr He Asp 
340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe lie He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Aen Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 
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Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu lie Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr lie Ser lie Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 

Asn Lys lie Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp lie Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin lie Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

lie Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His lie 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly lie Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCIDkWO 9813501A2> 



WO 98/13501 



1 

PCT/US97/16718 



- 287 - 



850 855 860 

Arg Ser Ala Ser Ser Aon Leu Ala Thr Ser Phe Ala Lye Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 . 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn Hia Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg lie Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Phe 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cye Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn lie Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp I^eu 
1125 1130 1135 
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Ala lie Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 H60 1165 

Val Val He Thr Gly Ser Glu His Cys Lys He Cys Tyr Ser Ser Asp 
1170 H75 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn He Lys He 
1185 H90 1195 1200 

Gly Ser Ala Glu Thr Gly He Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr He Lye Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lys Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser Hie Arg Phe Lys Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lya Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 



BNSDOCID:<WO 9813501A2> 



SUBSTITUTE SHEET (RULE 26) 



J 



WO 98/13501 PCIYUS97/16718 



- 289 - 



Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser He 

1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lye Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 1500 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 

1505 1510 1515 1520 

Arg Asp Thr Ser Hie Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 1550 

He Tyr Gly Pro Asn He Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 

Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 



Gly Val Ser Leu Glu He Tyr He 
1585 1590 

Asn Asp Arg Lys Gin Ala Phe He 
1605 

Cys Leu Ala Glu He Ala Ser Phe 
1620 



Cys Asp Ser Asp Met Glu Val Ala 
1595 1600 

Ser Arg His Leu Ser Phe Val Cys 
1610 1615 

Gly Pro Asn Leu Leu Asn Leu Thr 

1625 1630 



Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 

1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 

1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 
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1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn lie Val Lys Thr lie 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys lie Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys lie Arg Ser lie 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly lie Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

lie Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn lie Thr Asp Val lie Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
1845 1850 1855 

Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp He Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 

Leu He Trp Ser Glu Leu Asn Asp Lys Ser He Gly Leu Val His Cys 
1890 1895 1900 

Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 1920 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 
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Leu Lye Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 

1975 1980 

Asp Ala Tyr Cya Thr He Met Glu Pro Ser Glu He Val Leu Ser Lye 
1985 1990 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lye Trp He 
2005 2010 2015 

He Leu Ser Lys Lys Arg Asn Asn Glu Trp Leu His His Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Aan He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn lie Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 

Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 
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(2) INFORMATION FOR SEQ 10 NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 15218 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGAAAA AAATGGGGCA AATAAGAACT 
TGATAAGTGC TATTTAAGTC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 
TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 
CATGTTATAC TGATAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GCAATACATA 
CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 
ATAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATACTACAA AATGGAGGAT 
ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATT AAACGGTT T A ATGGATGATA 
ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 
AAATATCTGA CTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTCAAT 
AGACATGTGT TTATTACCAT TTTAGTTAAT ATAAAAACTC ATCAAAGGGA AATGGGGCAA 
ATAAACTCAC CTAATCAATC AAACCATGAG CACTACAAAT GACAACACTA CTATGCAAAG 
ATTGATGATC ACAGACATGA GACCCCTGTC AATGGATTCA ATAATAACAT CTCTTACCAA 
AGAAATCATC ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 
TGATGAAAGA CAAGCTACAT TTACATTCTT AG TCAATTAT GAGATGAAGC TACTGCACAA 
AGTAGGGAGT ACCAAATACA AAAAATACAC TGAATATAAT ACAAAAT AT G GCACTTTCCC 
CATGCCTATA TTTATCAATC ACGGCGGGTT TCTAGAATGT ATTGGCATTA AGCCTACAAA 
ACACACTCCT ATAATATACA AATATGACCT CAACCCGTGA ATTCCAACAA AAAAACCAAC 
CCAACCAAAC CAAACTATTC CTCAAACAAC AGTGCTCAAT AGTTAAGAAG GAGCTAATCC 
ATTTTAGTAA TTAAAAATAA AAGTAAAGCC AATAACATAA ATTGGGGCAA ATACAAAGAT 
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GGCTCTTAGC AAAGTCAAGT TGAATGATAC ATTAAATAAG GATCAGCTGC TGTCATCCAG 1200 

CAAATACACT ATTCAACGTA GTACAGGAGA TAATATTGAC ACTCCCAATT ATGATGTGCA 1260 

AAAACACCTA AACAAACTAT GTGGTATGCT ATTAATCACT GAAGATGCAA ATCATAAATT 1320 

CACAGGATTA ATAGGTATGT TATATGCTAT GTCCAGGTTA GGAAGGGAAG ACACTATAAA 1380 

GATACTTAAA GATGCTGGAT ATCATGTTAA AG CTAATGG A GTAGATATAA CAA CAT AT CG 144 0 
TCAAGATATA AATGGAAAGG AAATGAAATT CGAAGTATTA ACATTATCAA GCTTGACATC * 1500 

AGAAATACAA GTCAATATTG AGATAGAATC TAGAAAGTCC TACAAAAAAA TGCTAAAAGA 1560 

GATGGGAGAA GTGGCTCCAG AATATAGGCA TGATTCTCCA GACTGTGGGA TGATAATACT 1620 

GTGTATAGCT GCACTTGTGA TAACCAAATT AGCAGCAGGA GACAGATCAG GTCTTACAGC 1680 

AGTAATTAGG AGGGCAAACA ATGT CTTAAA AAACGAAATA AAACGATACA AGGGCCTCAT 1740 

ACCAAAGGAT AT AG CT AACA GTTTTTATGA AGTGTTTGAA AAACACCCTC ATCTTATAGA 1800 

TGTTTTCGTG CACTTTGGCA TTGCACAATC ATCCACAAGA GGGGGTAGTA GAGTTGAAGG 1860 

AATCTTTGCA GGATTGTTTA TGAATGCCTA TGGTTCAGGG CAAGTAATGC TAAGATG GGG 1920 

AGTTTTAGCC AAATCTGTAA AAAATATCAT GCTAGGACAT GCTAGTGTCC AGGCAGAAAT 1980 

GGAGCAAGTT GTGGAAGTCT ATGAGTATGC ACAGAAGTTG GGAGGAGAAG CTGGATTCTA 2040 

CCATATATTG AACAATCCAA AAGCATCATT GCTGTCATTA ACTCAATTTC CCAACTTCTC 2100 

AAGTGTGGTC CTAGGCAATG CAGCAGGTCT AGGCATAATG GGAGAGTATA GAGGTACACC 2160 

AAGAAACCAG GATCTTTATG ATGCAGCTAA AGCATATGCA GAGCAACTCA AAGAAAATGG 2220 

AGTAATAAAC TACAGTGTAT TAGACTTAAC AGCAGAAGAA TTGGAAGCCA TAAAGCATCA 22 80 

ACTCAACCCC AAAGAAGATG ATGTAGAGCT TTAAGTTAAC AAAAAATACG GGGCAAATAA 2340 

GTCAACATGG AGAAGTTTGC ACCTGAATTT CATGGAGAAG ATGCAAATAA CAAAGCTACC 2400 

AAATTCCTAG AATCAATAAA GGGCAAGTTC GCAT CATC CA AAGAT OCT AA GAAGAAAGAT 2460 

AGCATAATAT CTGTTAACTC AATAGATATA GAAGTAACTA AAGAGAGCCC GATAACATCT 2520 

GGCACCAACA TCATCAATCC AACAAGTGAA GCCGACAGTA CCCCAGAAAC AAAAGCCAAC 2580 

TACCCAAGAA AACCCCTAGT AAGCTTCAAA GAAGATCTCA CCCCAAGTGA CAACCCTTTT 2 640 

TCTAAGTTGT ACAAGGAAAC AAT AG AAA C A TTTGATAACA ATGAAGAAGA ATCTAGCTAC 2700 
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TCATATQAAG AGATAAATGA TCAAACAAAT GACAACATTA CAGCAAGACT AGATAGAATT 2760 

GATGAAAAAT TAAGTGAAAT ATTAGGAATG CTCCATACAT TAGTAGTTGC AAGTGCAGGA 2820 

CCCACTTCAG CTCGCGATGG AATAAGAGAT GCTATGGTTG GTCTAAGAGA AGAGATGATA 2880 

GAAAAAATAA GAGCGGAAGC ATTAATGACC AATGATAGGT TAGAGGCTAT GGCAAGACTT 2940 

AGGAATGAGG AAAGCGAAAA AATGGCAAAA GACACCTCAG ATGAAGTGTC TCTTAATCCA 3000 

ACTTCCAAAA AATTGAGTGA CTTGTTGGAA GACAACGATA GTGACAATGA TCTATCACTT 3060 

GATGATTTTT GATCAGCGAT CAACTCACTC AGCAATCAAC AACATCAATA AAACAGACAT 312 0 

CAATCCATTG AATCAACTGC CAGACCGAAC AAACAAACGT CCATCAGTAG AACCACCAAC 3180 

CAATCAATCA AC CAATTGAT CAATCAGCAA CCCGACAAAA TTAACAATAT AGTAACAAAA 3240 

AAAGAACAAG ATGGGGCAAA TATGGAAACA TACGTGAACA AGCTTCACGA AGGCTCCACA 33 00 

TACACAGCAG CTGTTCAGTA CAATGTTCTA GAAAAAGATG ATGATCCTGC ATCACTAACA 33 60 

ATATGGGTGC CTATGTTCCA GTCATCTGTG CCAGCAGACT TGCTCATAAA AGAACTTGCA 3420 

AGCATCAATA TACTAGTGAA GCAGATCTCT ACGCCCAAAG GACCTTCACT ACGAGTCACG 3480 

ATTAACTCAA GAAGTGCTGT GCTGGCTCAA ATGCCTAGTA ATTTCATCAT AAGCGCAAAT 3540 

GTATCATTAG ATGAAAGAAG CAAATTAGCA TATGATGTAA CTACACCTTG TGAAATCAAA 3600 

GCATGCAGTC TAACATGCTT AAAAGTAAAA AGTATGTTAA CTACAGTCAA AGATCTTACC 3 660 

ATGAAGACAT TCAACCCCAC TCATGAGATC ATTGCTCTAT GTGAATTTGA AAATATTATG 372 0 

ACATCAAAAA GAGTAATAAT ACCAACCTAT CTAAGATCAA TTAGTGTCAA GAACAAGGAT 3780 

CTGAACTCAC TAGAAAATAT AGCAACCACC GAATTCAAAA ATGCTATCAC CAATGCAAAA 3840 

ATTATTCCTT ATGCAGGATT AGTGTTAGTT ATCACAGTTA CTGACAATAA AGGAGCATTC 3900 

AAATATATCA AACCACAGAG TCAATTTATA GTAGATCTTG GTGCCTACCT AGAAAAAGAG 396 0 

AGCATATATT ATGTGACTAC TAATTGGAAG CATACAGCTA CACGTTTTTC AATCAAACCA 4020 

CTAGAGGATT AAACTTAATT ATCAACACTG AATGACAGGT CCACATATAT CCTCAAACTA 4080 

CACACTATAT CCAAACATCA TAAACATCTA CACTACACAC TTCATCACAC AAACCAATCC 4140 

CACTCAAAAT CCAAAATCAC TACCAGCCAC T AT CTGCTAG ACCTAGAGTG CGAATAGGTA 42 00 

AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTTAACAACC 4260 
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ATTTATACCG CCAATTCAAC ACATATACTA TAAATCTTAA AATGGGAAAT ACATCCATCA 432 0 

CAATAGAATT CACAAGCAAA TTTTGGCCCT ATTTTACACT AATACATATG ATCTTAACTC 4380 

TAATCTTTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 4440 

ATAAAGCATT CTGTAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAGA 4500 

GTTCTACCAT TATGCTGTGT CAAATTATAA TCCTGTATAT ATAAACAAAC AAAT CCAATC 4 560 

TTCTCACAGA GTCATGGTGT CGCAAAACCA CGCTAACTAT CATGGTAGCA TAGAGTAGTT 4 620 

ATTTAAAAAT TAACATAATG ATGAATTGTT AGTATGAGAT CAAAAACAAC ATTGGGGCAA 4 680 

ATGCAACCAT GTCCAAACAC AAGAATCAAC GCACTGCCAG GACTCTAGAA AAGACCTGGG 4740 

ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 4 800 

TAGCACAAAT AGCACTATCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 4860 

CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTC ACAGTTCAAA 4920 

CAATAAAAAA CCACACTGAA AAAAACATCA GCAGCTACCG TACTCAAGTC TCACCAGAAA 4980 

GGGTTAGTTC ATCCAAGCAA CCCACAACCA CATCACCAAT CCACACAAGT TCAGCTACAA 5040 

CATCACCCAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAACCA 5100 

CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAACCACG TCCAAAAAAT CCACCAAAAA 5160 

AAGATGATTA CCATTTTGAA GTGTTCAACT TCGTTCCCTG CAGTATATGT GGCAACAATC 52 20 

AACTTTGCAA ATCCATCTGC AAAACAATAC CAAGCAACAA ACCAAAGAAG AAACCAACCA 5280 

TCAAACCCAC AAACAAACCA ACCAC CAAAA CCACAAACAA AAGAGACCCA AAAACA CCAG 5340 

CCAAAACGAC GAAAAAAGAA ACTACCACCA ACCCAACAAA AAAACTAACC CTCAAGACCA 5400 

CAGAAAGAGA CACCAGCACC TCACAATCCA CTGCACTCGA CACAACCACA TTAAAACACA 5460 

CAGTCCAACA GCAATCCCTC CTCTCAACCA CCCCCGAAAA CACACCCAAC TCCACACAAA 5520 

CACCCACAGC ATCCGAGCCC TCCACACCAA ACTCCACCCA AAAAACCCAG CCACATGCTT 5580 

AGTTATTCAA AAACTAC AT C TTAGCAGAGA ACCGTGATCT AT CAAGCAAG AAC G AAATT A 564 0 

AACCTGGGGC AAATAACCAT GGAGTTGATG ATCCACAAGT CAAGTGCAAT CTTCCTAACT 5700 

CTTGCTATTA ATGCATTGTA CCTCACCTCA AGTCAGAACA TAACTGAGGA GTTTTACCAA 5760 

TCGACATGTA GTGCAGTTAG CAGAGGTTAT TTTAGTGCTT TAAGAACAGG TTGGTATACT 582 0 
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AGTGTCATAA CAATAQAATT AAGTAATATA AAAGAAACCA AATGCAATGG AACTGACACT 
AAAGTAAAAC TTATGAAACA AGAATTAGAT AAGTATAAGA ATGCAGTAAC AGAATTACAG 
CTACTTATGC AAAACACACC AGCTGTCAAC AACCGGGCCA GAAGAGAAGC ACCACAGTAT 
ATGAACTACA CAATCAATAC CACTAAAAAC CTAAATGTAT CAATAAGCAA GAAGAGGAAA 
CGAAGATTTC TAGGCTTCTT GTTAGGTGTG GGATCTGCAA TAGCAAGTGG TATAGCTGTA 
TCAAAAGTTC TACACCTTGA AGGAGAAGTG AACAAGATCA AAAATGCTTT GTTGT CTACA 
AACAAAGCTG TAGTCAGTTT ATCAAATGGG GTCAGTGTTT TAACCAGCAA AGTGTTAGAT 
CTCAAGAATT ACATAAATAA CCAATTATTA CCCATAGTAA AT CAACAG AG CTGTCGCATC 
TCCAACATTG AAACAGTTAT AGAATTCCAG CAGAAGAACA GCAGATTGTT GGAAATCACC 
AGAGAATTTA GTGT CAATGC AGGTGTAACA ACACCTTTAA GCACTTACAT GTTGACAAAC 
AGTGAGTTAC TATCATTAAT CAATGATATG CCTATAACAA ATGATCAGAA AAAATTAATG 
TCAAGCAATG TTCAGATAGT AAGGCAACAA AGTTATTCCA TCATGTCTAT AATAAAGGAA 
GAAGTCCTTG CATATGTTGT ACAGCTGCCT ATCTATGGTG TAATAGATAC ACCTTGCTGG 
AAATTGCACA CATCGCCTCT ATGCACTACC AACATCAAAG AAGGATCAAA TATTTGTTTA 
ACAAGGACTG ATAGAGGATG GTATTGTGAT AATGCAGGAT CAGTATCCTT CTTTCCACAG 
GCTGACACTT GTAAAGTACA GTCCAATCGA GTATTTTGTG ACACTATGAA C AGTTT G AC A 
TTACCAAGTG AAGTCAGCCT TTGTAACACT GACATATTCA ATT CCAAGTA TGACTGCAAA 
ATTATGACAT CAAAAACAGA CATAAGCAGC TCAGTAATTA CTTCTCTTGG AGCTATAGTG 
TCATGCTATG GTAAAACTAA ATGCACTGCA TCCAACAAAA ATCGTGGGAT TATAAAGACA 
TTTTCTAATG GTTGTGACTA TGTGTCAAAC AAAGGAGTAG AT ACTGTGT C AGTGGGCAAC 
ACTTTATACT ATGTAAACAA GCTGGAAGGC AAGAACCTTT ATGTAAAAGG GGAACCTATA 
ATAAATTACT ATGACCCTCT AGTGTTTCCT TCTGATGAGT TTGATGCATC AATAT CTCAA 
GTCAATGAAA AAATCAATCA AAGTTTAGCT TTTATTCGTA GATCTGATGA ATTACTACAT 
AATGTAAATA CTGGCAAATC TACTACAAAT ATTATGATAA CTACAATTAT TATAGTAATC 
ATTGTAGTAT TGTTATCATT AATAGCTATT GGTTTACTGT TGTATTGTAA AGCCAAAAAC 
ACACCAGTTA CACTAAGCAA AGACCAACTA AGTGGAATCA ATAATATTGC ATTCAGCAAA 
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TAGACAAAAA ACCACCTGAT CATGTTTCAA CAACAATCTG CTGACCACCA ATCCCAAATC 7440 

AACTTACAAC AAATATTTCA ACATCACAGT ACAGGCTGAA TCATTTCCTC ACATCATGCT 7500 

ACCCACATAA CTAAGCTAGA TCCTTAACTT ATAGTTACAT AAAAACCTCA AGTATCACAA 7560 

TCAACCACTA AATCAACACA TCATTCACAA AATTAACAGC TGGGGCAAAT ATGTCGCGAA 7620 

GAAATCCTTG TAAATTTGAG ATTAGAGGTC ATTGCTTGAA TGGTAGAAGA TGTCACTACA 7680 

GTCATAATTA CTTTGAATGG CCTCCTCATG CATTACTAGT GAGG CAAAAC TTCATGTTAA 7740 

ACAAGATACT CAAGTCAATG GACAAAAGCA TAGACACTTT GTCTGAAATA AGTGGAGCTG 7 800 

CTGAACTGGA TAGAACAGAA GAATATGCTC TTGGTATAGT TGGAGTGCTA GAG AG TTAC A 7 86 0 

TAGGATCTAT AAACAACATA ACAAAACAAT CAGCATGTGT TGCTATGAGT AAACTTCTTA 7920 

TTGAGATCAA TAGTGATGAC ATTAAAAAGC TTAGAGATAA TGAAGAACCC AATTCACCTA 7 980 

AGATAAGAGT GTACAATACT GTTATATCAT ACATTGAGAG CAATAGAAAA AACAACAAGC 8040 

AAACCATCCA TCTGCTCAAG AGACTACCAG CAGACGTGCT GAAGAAGACA ATAAAGAACA 8100 

CATTAGATAT CCACAAAAGC ATAACCATAA GCAATCCAAA AGAGTCAACT GTGAATGATC 8160 

AAAATGACCA AACCAAAAAT AATGATATTA CCGGATAAAT ATC CTTGTAG TAT AT CAT C C 822 0 

ATATTGATCT CAAGTGAAAG CATGGTTGCT ACATTCAATC ATAAAAACAT ATTACAATTT 8280 

AACCATAACT ATTTGGATAA CCACCAGCGT TTATTAAATC ATATATTTGA TGAAATTCAT 834 0 

TGGACACCTA AAAACTTATT AGATGCCACT CAACAATTTC TCCAACATCT TAACATCCCT 8400 

GAAGATATAT ATACAGTATA TATATTAGTG TCATAATGCT TGACCATAAC GACT CTATGT 8460 

CATCCAACCA TAAAACTATT TTGATAAGGT TATGGGACAA AATGGATCCC ATTATTAATG 8520 

GAAACTCTGC TAATGTGTAT CTAACTGATA GTTATTTAAA AGGTGTTATC TCTTTTTCAG 8580 

AGTGTAATGC TTTAGGGAGT TATCTTTTTA ACGGCCCTTA TCTTAAAAAT GATTACACCA B640 

ACTTAATTAG TAGACAAAGC CCACTACTAG AGCATATGAA TCTTAAAAAA CTAACTATAA 8700 

CACAGTCATT AATATCTAGA TATCATAAAG GTGAACTGAA ATTAGAAGAA CCAACTTATT 8760 

TCCAGTCATT ACTTATGACA TATAAAAGTA TGTCCTCGTC TGAACAAATT GCTACAACTA 8820 

ACTTACTTAA AAAAATAATA CGAAGAGCCA TAGAAATAAG TGATGTAAAG GTGTACGCCA 8880 

TCTTGAATAA ACTAGGATTA AAGGAAAAGG ACAGAGTTAA GCCCAACAAT AATTCAGGTG 8940 
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ATGAAAACTC AGTACTTACA ACCATAATTA AAGATGATAT ACTTTCGGCT GTG GAAAACA 9000 

ATCAATCATA TACAAATTCA GACAAAAGTC ACTCAGTAAA TCAAAATATC ACTATCAAAA 9060 

CAACACTCTT GAAAAAATTG ATGTGTTCAA TGCAACATCC TCCATCATGG TTAATACACT 9120 

GGTTCAATTT ATATACAAAA TTAAATAACA TATTAACACA ATATCGATCA AATGAGGTAA 9180 

AAAGTCATGG GTTTATATTA ATAGATAATC AAACTTTAAG TGGTTTTCAG TTTATTTTAA 9240 

ATCAATATGG TTGTATCGTT TATCATAAAG GACTCAAAAA AATCACAACT ACTACTTACA 93 00 

ATCAATTTTT GACATGGAAA GACATCAGCC TTAGCAGATT AAATGTTTGC TTAATTACTT 93 60 

GGATAAGTAA TTGTTTAAAT ACATTAAACA AAAGCTTAGG GCTGAGATGT GGATTCAATA 942 0 

ATGTTGTGTT ATCACAATTA TTTCTTTATG GAGATTGTAT ACTGAAATTA TTTCATAATG 9480 

AAGGCTTCTA CATAATAAAA GAAGTAGAGG GATTTATTAT GTCTTTAATT CTAAACATAA 954 0 

CAGAAGAAGA TCAATTTAGG AAACGATTTT AT AAT AG CAT GCTAAATAAC ATCACAGATG 9600 

CAGCTATTAA GGCTCAAAAG GACCTACTAT GAAGAGTATG TCACACTTTA TTAGACAAGA 9660 

CAGTGTCTGA T AAT AT CAT A AATGGTAAAT GGATAATCCT ATTAAGTAAA TTTCTTAAAT 9720 

TGATTAAGCT TGCAGGTGAT AATAATCTCA ATAACTT GAG TGAGCTATAT TTTCTCTTCA 9780 

GAATCTTTGG ACATCCAATG GTCGATGAAA G ACAAG CAAT GGATTCTGTA AGAATTAACT 984 0 

GTAATGAAAC TAAGTTCTAC TTATTAAGTA GTCTAAGTAC ATTAAGAGGT GCTTTCATTT 9900 

ATAGAATCAT AAAAGGGTTT GTAAATACCT ACAACAGATG GCCCACCTTA AGGAATGCTA 9960 

TTGTCCTACC TCTAAGATGG TTAAACTACT ATAAACTTAA TACTTATCCA TCTCTACTTG 10 020 

AAATCACAGA AAATGATTTG ATTATTTTAT CAGGATTGCG GTTCTATCGT GAGTTTCATC 10080 

TGCCTAAAAA AGTGGATCTT GAAATGATAA TAAATGACAA AGCCATTTCA CCTCCAAAAG 10140 

ATCTAATATG GACTAGTTTT CCTAGAAATT ACATG CCAT C ACATATACAA AATTATATAG 10200 

AACATGAAAA GTTGAAGTTC TCTGAAAGCG ACAGATCGAG AAGAGTACTA GAGTATTACT 10260 

TGAGAGATAA TAAATTCAAT GAATGCGATC TATACAATTG TGTAGTCAAT CAAAGCTATC 10320 

TCAACAACTC TAATCACGTG GTATCACTAA CTGGTAAAGA AAGAGAGCTC AGTGTAGGTA 10380 

GAATGTTTGC TATGCAACCA GGTATGTTTA GGCAAATCCA AATCTTAGCA GAGAAAATGA 10440 

TAGCTGAAAA TATTTTACAA TTCTTCCCTG AGAGTTTGAC AAGATATGGT GATCTAGAGC 10500 
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TTCAAAAGAT ATTAGAATTA AAAGCAGGAA TAAGCAACAA GTCAAATCGT TATAATGATA 
ACTACAACAA TTATATCAGT AAATGTTCTA TCATTACAGA TCTTAGCAAA TTCAATCAGG 
CATTTAGATA TGAAACATCA TGTATCTGCA GTGATGTATT AGATGAACTG CATGGAGTAC 
AATCTCTGTT CTCTTGGTTG CATTTAACAA TACCTCTTGT CACAATAATA TGTACATATA 
GACATGCACC TCCTTTCATA AAGGATCATG TTGTTAATCT TAATGAGGTT GATGAACAAA 
GTGGATTATA CAGAT AT CAT ATGGGTGGTA TTGAGGGCTG GTGTCAAAAA CTGTGGACCA 
TTGAAGCTAT ATCATTATTA GATCTAATAT CTCTCAAAGG GAAATTCTCT ATCACAGCTC 
TGATAAATGG TGATAATCAG TCAATTGATA TAAGCAAACC AGTTAGACTT ATAGAGGGTC 
AGACCCATGC AC AAG CAGAT TATTTGTTAG CATTAAATAG CCTTAAATTG TTATATAAAG 
AGTATGCAGG TATAGGCCAT AAGCTTAAGG GAACAGAGAC CTATATATCC CGAGATATGC 
AGTTCATGAG CAAAACAATC CAGCACAATG GAGTGTACTA TCCAGCCAGT ATCAAAAAAG 
TCCTGAGAGT AGGTCCATGG ATAAACACGA TACTTGATGA TTTTAAAGTT AGTTTAGAAT 
CTATAGGCAG CTTAACACAG GAGTTAGAAT ACAGAGGAGA AAGCTTATTA TGCAGTTTAA 
TATTTAGGAA CATTTGGTTA TACAATCAAA TTG CTTTGCA ACTCCGAAAT CATGCATTAT 
GTAACAATAA GCTATATTTA GATATATTGA AAGTATTAAA ACACTTAAAA ACTTTTTTTA 
ATCTTGATAG CATTGATATG GCTTTATCAT TGTATATGAA TTTGCCTATG CTGTTTGGTG 
GTGGTGATCC TAATTTGTTA TATCGAAGCT TTTATAGGAG AACTCCAGAC TTCCTTACAG 
AAGCTATAGT ACATTCAGTG TTTGTGTTGA GCTATTATAC TGGTCACGAT TTACAAGATA 
AGCTCCAGGA TCTTCCAGAT GATAGACTGA ACAAATTCTT GACATGTGTC ATCACATTTG 
ATAAAAATCC CAATGCCGAG TTTGTAACAT TGATGAGGGA TCCACAGGCT TTAGGGTCTG 
AAAGGCAAGC TAAAATTACT AGTGAGATTA ATAGATTAGC AG T AACAGAA GTCTTAAGTA 
TAGCCCCAAA CAAAATATTT TCTAAAAGTG CACAACATTA TACTACCACT GAGATTGATC 
TAAATGACAT TATGCAAAAT ATAGAACCAA CTTACCCTCA TGGATTAAGA GTTGTTTATG 
AAAGTTTACC T TTTTAT AAA GCAGAAAAAA TAGTTAATCT TATATCAGGA ACAAAATCCA 
TAACTAATAT ACTTGAAAAA ACATCAGCAA TAGATACAAC TGATATTAAT AGGGCTACTG 
ATATGATGAG GAAAAATATA ACTTTACTTA TAAGGATACT TCCACTAGAT TGTAACAAAG 
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ACAAAAGAGA GTTATTAAGT TTAGAAAATC TTAGTATAAC TGAATTAAGC AAGTATGTAA 12120 

GAGAAAGATC TTGGTCATTA TCCAATATAG TAGGAGTAAC ATCGCCAAGT ATTATGTTCA 12180 

CAATGGACAT TAAATATACA ACTAGCACTA TAGCCAGTGG TATAATAATA GAAAAATATA 12240 

ATGTTAATAG TTTAACTCGT GGTGAAAGAG GACCCACCAA GCCATGGGTA GGCTCATCCA 12300 

CGCAGGAGAA AAAAACAATG CCAGTGTACA ACAGACAAGT TTTAACCAAA AAGCAAAGAG 12360 

ACCAAATAGA TTTATTAGCA AAATTAGACT GGGTATATGC ATCCATAGAC AACAAAGATG 12420 

AATTCATGGA AGAACTGAGT ACTGGAACAC TTGGACTGTC ATATGAAAAA GCCAAAAAGT 12480 

TGTTTCCACA ATATCTAAGT GTCAATTATT TACACCGTTT AACAGTCAGT AGTAGACCAT 12540 

GTGAATTCCC TGCATCAATA CCAGCTTATA GAACAACAAA TTATCATTTT GATACTAGTC 12 600 

CTATCAATCA TGTATTAACA GAAAAGTATG GAGATGAAGA TAT CG AC ATT GTGTTTCAAA 12660 

ATTGCATAAG TTTTGGTCTT AGCCTGATGT CGGTTGTGGA ACAATTCACA AACATATGTC 12720 

CTAATAGAAT TATTCTCATA CCGAAGCTGA ATGAGATACA TTTGATGAAA CCTCCTATAT 127 80 

TTACAGGAGA TGTTGATATC ATCAAGTTGA AGCAAGTGAT ACAAAAGCAG CACATGTTCC 12840 

TACCAGATAA AATAAGTTTA ACCCAATATG TAGAATTATT CTTAAGTAAC AAAGCACTTA 12900 

AATCTGGATC TCACATCAAC TCTAATTTAA TATTAGTACA TAAAATGTCT GATTATTTTC 12960 

ATAATGCTTA TATTTTAAGT ACTAATTTAG CTG GACATTG GATTCTGATT AT T C AACTT A 13 020 

TGAAAGATTC AAAAGGTATT TTTGAAAAAG ATTGGGGAGA GGGGTACATA ACTGAT CAT A 13080 

TGTTCATTAA TTTGAATGTT TTCTTTAATG CTTATAAGAC TTATTTGCTA TGTTTTCATA 13140 

AAGGTTATGG TAAAGCAAAA TTAGAATGTG ATATGAACAC TTCAGATCTT CTTTGTGTTT 13 200 

TGGAGTTAAT AGACAGTAGC TACTGGAAAT CTATGTCTAA AGTTTTCCTA GAACAAAAAG 13260 

TCATAAAATA CATAGT CAAT CAAGACACAA GTTTGCGTAG AATAAAAGGC TGTCACAGTT 13320 

TTAAGTTGTG GTTTTTAAAA CGCCTTAATA ATGCTAAATT TACCGTATGC CCTTGGGTTG 13380 

TTAACATAGA TTATCACCCA ACACACATGA AAGCTATATT ATCTTACATA GATTTAGTTA 13440 

GAATGGGGTT AATAAATGTA GATAAATTAA CCATTAAAAA TAAAAACAAA TTCAATGATG 13500 

AATTTTACAC ATCAAATCTC TTTTACATTA GTTATAACTT TT C AG AC AAC ACTCATTTGC 13560 

TAACAAAACA AATAAGAATT GCTAATTCAG AATTAGAAGA TAATTATAAC AAACTATATC 13620 
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ACCCAACCCC AGAAACTTTA GAAAATATGT CATTAATTCC TGTTAAAAGT AATAATAGTA 13 680 
ACAAACCTAA ATTTTGTATA AGTGGAAATA CCGAATCTAT GATGATGTCA ACATTCTCTA 13740 
GTAAAATGCA TATTAAATCT TCCACTGTTA CCACAAGATT CAATTATAGC AAACAAGACT 13800 
TGTACAATTT ATTTCCAATT GTTGTGATAG ACAAGATTAT AGATCATTCA GGTAATACAG 13 850 
CAAAATCTAA CCAACTTTAC ACCACCACTT CACATCAGAC ATCTTTAGTA AGGAATAGTG 13 920 
CATCACTTTA TTGCATGCTT CCTTGGCATC ATGTCAATAG ATTTAACTTT GTATTTAGTT 13 980 
CCACAGGATG CAAGATCAGT ATAGAGTATA TTTTAAAAGA TCTTAAGATT AAGGACCCCA 1404 0 

GTTGTATAGC ATTCATAGGT GAAGGAGCTG GTAACTTATT ATTACGTACG GTAGTAGAAC 14100 

TTCATCCAGA CATAAGATAC ATTTACAGAA GTTTAAAAGA TTG CAATGAT CATAGTTTAC 1416 0 

CTATTGAATT TCTAAGGTTA TACAACGGGC ATATAAACAT AGATTATGGT GAGAATTTAA 14220 

CCATTCCTGC TACAGATGCA ACTAATAACA TTCATTGGTC TTATTTACAT ATAAAATTTG 142 80 

CAGAACCTAT TAGCATCTTT GTCTGCGATG CTGAATTACC TGTTACAGCC AATTGGAGTA 14340 

AAATTATAAT TGAATGGAGT AAGCATGTAA GAAAGTGCAA GTACTGTTCT TCTGTAAATA 14400 

GATGCATTTT AATTGCAAAA TATCATGCTC AAGATGACAT TGATTTCAAA TTAGATAACA 14460 

TTACTATATT AAAAACTTAC GTGTGCCTAG GTAGCAAGTT AAAAGGATCT GAAGTTTACT 14520 

TAATCCTTAC AATAGGCCCT GCAAATATAC TTCCTGTTTT TGATGTTGTA CAAAATGCTA 14580 

AATTGACACT TTCAAGAACT AAAAATTTCA TTATGCCTAA AAAAACTGAC AAGGAATCTA 14 640 

TCGATGCAAA TATTAAAAGC TTAATACCTT TCCTTTGTTA C CCTATAAC A AAAAAAGGAA 14700 

TTAAGACTTC AT TGT CAAAA TTGAAGAGTG TAGTTAATGG AGATATATTA TCATATTCTA 147 60 

TAGCTGGACG TAATGAAGTA TTCAGCAACA AGCTTATAAA CCACAAGCAT ATGAATATCC 14820 

TAAAATGGCT AGATCATGTT TTAAATTTTA GATCAGCTGA ACTTAATTAC AATCATTTAT 14 880 

ACATGATAGA GTCCACATAT CCTTACTTAA GTGAATTGTT AAATAGTTTA ACAACCAATG 14940 

AGCTCAAGAA GCTGATTAAA ATAACAGGTA GTGTGCTATA CAACCTTCCC AACGAACAGT 15000 

AGTTTAAAAT ATCATTAACA AGTTTGGTCA AATTTAGATG CTAACACATC ATTATATTAT 15060 

AGTTATTAAA AAATATACAA ACTTTTCAAT AATT TAGCAT ATTGATTCCA AAATTATCAT 1512 0 

TTTAGTCTTA AGGGGTTAAA TAAAAGTCTA AAACTAACAA TTATACATGT GCATTCACAA 15180 
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CACAACGAGA CATTAGTTTT TGACACTTTT TTTCTCGT 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MG^CULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Met Asp Pro lie He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Aep 
1 5 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lye Lys Leu 

50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu ,Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
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180 185 190 

Leu Met Cys Ser Met Gin Hie Pro Pro Ser Trp Leu lie His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe lie Leu lie Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe lie Leu Asn Gin Tyr Gly Cys lie Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Aen He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 
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Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Aen Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Aen He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 
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He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp Hia Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lye Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu lie Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
B20 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 



Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lye Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
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1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 HOO 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1H5 H20 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lye 
1125 H30 1135 

lie Val Abu Leu He Ser Gly Thr Lys Ser lie Thr Asn lie Leu Glu 
1140 1145 H50 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 H60 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 H75 H80 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 H90 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 
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Trp Val Tyr Ala Ser He Asp Asn Lye Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lye Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Aen His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Aep He Aap He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp lie He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp lie Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Aen 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 
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Leu lie Asp Ser Ser Tyr Trp Lys Ser Met Ser Lye Val Phe Leu Glu 
1570 X575 1580 

Gin Lys Val lie Lye Tyr lie Val Aen Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

lie Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Aen Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His Hie Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr lie Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
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1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lya Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1685 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu Hie He Lys Phe Ala Glu Pro He Ser lie Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lye He He lie Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu lie Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Thr Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 
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He Glu Ser Thr Tyr Pro Tyr Lou Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Aan Glu Leu Lys Lye Leu lie Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 216< 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1522 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGGAAA AAATGGGGCA AATAAGAATT 60 

TGATAAGTGC TATTTAAATC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 120 

TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 1B0 

CATGTTATAC TGACAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GTAATACATA 240 

CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 300 

ACAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATATTACAA AACGGAGGAT 360 

ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATC AAATGGTCTA ATGGATGATA 420 

ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 480 

AAATATCTGA TTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTTAAT 54 0 

AGACATGTGT TTATCACCAT TTTAGTTAAT ATAAAACCTC ATCAAAGGGA AATGGGGCAA 600 

ATAAACTCAC CTAATCAGTC AAA C CAT GAG CACTACAAAT GACAACACTA CTATGCAAAG 660 

ATTGATGATC ACAGACATGA GACCCCTGTC GATGGAATCA ATAATAACAT CTCTCACCAA 720 

AGAAATCATA ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 7 80 

TGATGAAAGA CAAGCTACAT TTACATTCTT AGTCAATTAT GAGATGAAGC TATTGCACAA 840 



BNSDOCI0:<WO 961 3501 A2> 



SUBSTITUTE SHEET (RULE 26) 



1 

WO 98/13501 



1 

PCIYUS97/16718 



- 311 - 



AGTA6GOAGT 


ACCAAATACA 


AGAAATACAC 


TGAATATAAT 


ACAAAATATf! 


IJLAL XXX LLL 




CATGCCTATA 


TTTAT CAATC 


ATGACGGGTT 


TCTAGAATGT 


ATTGC3PATTA 


A\J\«. V_ X nV« AAA 


y o u 


ACACACTCCT 


ATAATATACA 


AATATGACCT 


CAACCCGTAA 


ATTCPAACAA 

A X X W\vAA.\_AA 


AAA Ar*TA arr 

AAaaV* X AAUL 




CATCCAAACT 


AAGCTATTCC 


TCAAACAACA 


GTGCTPAArA 

v» X urv_ X UAAV.A 


TT A A fl A A fZT* 
iJ X 1 aAuaauu 


AvjC T AATC CA 


1080 


TTTTAGTAAT 


TAAAAATAAA 


GGCAGAnCPA 


AT A AP AT A A & 

A X AAwi X AAA 


X x bbuViUUlA 


TACAAAGATG 


1140 


GCTCTTAGCA 


AAGTCAAGTT 


AAATGATACA 


TTAAATAARG 

X X AAA X AnV^VJ 


A X V—AVL X X 




1200 


AAATACACTA 


TTCAACGTAG 


TACAGGAGAT 


AATATTfiAPA 

AA X A X X ui\Ui\ 


X LLLAAl 1 A 


x wATvj 1 UCIAA 


12 6 0 


AAACACCTAA 


ACAAACTATG 


TGGTATGCTA 

«L VJVJ X A X \JU x, A 


X X aa X LAV, 1 V? 


»»n» 1» TV TV TV 

AAbA 1 L»C AAA 


TCATAAATT C 


1320 


ACAGGATTAA 


T AGGT A TfJTT 


A X A X X A X \jt 


imAlvwX lAut 


GAAGGGAAGA 


CACTATAAAG 


1380 


ATACTTAAAG 


ATGCTGGATA 


TPATHTTA A A 

X l*-AX VJi 1 AAA 


X AA X UVjAu 


fT* T\ /""♦ TV I 1 TV en % <& n 

I AOA x ATAAC 


AACATATCGT 


1440 


CAAGATATAA 


ACGGAAAGGA 


AATGAAAT T C 


wUiy X A X X AA 


\_a X X A X UAAo 


/ Hinpj-i TV fTl »T»*™» » 

LTxTGACATCA 


1500 


GAAATACAAG 


TCAATATTGA 


GATAGAATCT 


Aft A A ftCTfCT 


AUUVAMAA 1 


PV' "I* TA TV TV TV TV /**> 


i c c n 
1560 


ATGGGAGAAG 


TGGCTCCAGA 


ATATAGGCAT 


n a ttpt r* r* a r 1 


At X X VjljijAl 


/-I H fTi jv TV T TV /"•'IV* 

(jATAaTACTG 


1620 


TGTATAGCTG 


CACTTGTAAT 


AJV C fAACTT A 




A 1 AL» A 1 t_At»Li 


1 C x X At-AU ua 


1680 


GTAATTAGGA 


GGGCAAACAA 


TGTCTTAAAA 


AACGAAATAA 


A A r*f2r*T A f A A 

AAv* vj»\_ X AL AA 


VjVjwV_V^. X LfAI a 


i *7 a n 
x / ^ 0 


CCAAAGGATA 


TAGCTAACAG 


TTTTTATGAA 


GTGTTTGAAA 


AACAPCf^TCA 


Tf*TTATAflAT 
iv,l X a X nVj a X 


i nnn 


GTTTTTGTGC 


ACTTTGGCAT 


TGCACAATCA 


TC CACAAGAG 




Avy x x unfiViuA 


XODU 


ATCTTTGCAG 


GATTATTTAT 


GAATGCCTAT 




A A RT A A TfSPT 
AAO lAAluL X 


nnvjA X \» uubA 




GTTCTAGCCA 


AATCTGTAAA 


AAATATCATG 


CTAGGACATG 


CTAGTGTCCA 


GGCAGAAATG 


1980 


GAACAAGTTG 


TGGAAGTTTA 


TGAGTATGCA 


CAGAAGTTGG 


GAGGAGAAGC 


TGGATTCTAC 


2040 


CATATATTGA 


ACAATCCAAA 


AGCATCATTG 


CTGTCATTAA 


CTCAATTTCC 


TAACTTCTCA 


2100 


AGTGTGGTCC 


TAGGCAATGC 


AGCAGGTCTA 


GGCATAATGG 


GAGAGTATAG 


AGGTACACCA 


2160 


AGAAACCAAG 


ATCTATATGA 


TGCAGCCAAA 


GCATATGCAG 


AGCAACTCAA 


AGAAAATGGA 


2220 


GTAATAAACT 


ACAGTGTATT 


AGACTTAACA 


GCAGAAGAAT 


TGGAAGCCAT 


AAAGCATCAA 


2280 


CTCAACCCCA 


AAGAAGATGA 


TGTAGAGCTT 


TAAGTTAACA 


AAAAATACGG 


GGCAAATAAG 


2340 


TCAACATGGA 


GAAGTTTGCA 


CCTGAATTTC 


ATGGAGAAGA 


TGCAAACAAC 


AAAGCTACCA 


2400 
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AATTCCTAGA ATCAATAAAG GGCAAGTTTG CATCATCCAA AGATCCTAAG AAGAAAGATA 2460 

GCATAATATC TGTTAACTCA ATAGATATAG AAGTAACTAA AGAGAGCCCG ATAACATCTG 2520 

GCACCAACAT CATCAATCCA ATAAGTGAAG CTGATAGTAC CCCAGAAGCT AAAGCCAACT 2580 

ACCCAAGAAA ACC CCTAGTA AGCTTCAAAG AAGATCTCAC CCCAAGTGAC AACCCCTTTT 2640 

CTAAGTTGTA CAAAGAAACA ATAGAAACAT TTGATAACAA TGAAGAAGAA TCTAGCTACT 2700 

CATATGAAGA AATAAATGAT CAAACAAATG ACAACATTAC AGCAAGACTA GATAGAATTG 2760 

ATGAAAAATT AAGTGAAATA TTAGGAATGC TCCATACATT AGTAGTTGCA AGTGCAGGAC 2820 

CCACCTCAGC TCGCGATGGA ATAAGAGATG CTATGGTTGG TCTAAGAGAA GAAATGATAG 2 880 

AAAAAATAAG AGCGGAAGCA TTAAT GACCA ATGATAGGTT AGAGGCTATG GCAAGACTTA 2940 

GGAATGAGGA AAGCGAAAAA ATGGCAAAAG ACACCTCAGA TGAAGTGTCT CTTAATCCAA 3000 

CTTCCAAAAA ATTGAGTAAT TTGTTGGAAG ACAACGATAG TGACAATGAT CTATCACTTG 3060 

ATGATTTTTG ATCAGTGATC AACTCACTCA GCAATCAACA AC AT C AATG A AACAGACATC 3120 

AATCCATTGA ATCAACTGCC AGACTGAACA CACAAACGTC CATCAGCAGA ACTACCAACC 3180 

AATCAATCAA CCAATTGATC AATCAGCGAC CTAACAAAAT TAACAATATA GTAACAAAAA 3240 

AAGAACAAGA TGGGGCAAAT ATGGAAACAT ACGTGAACAA GCTTCACGAG GGCTCCACAT 3300 

ACACAGCAGC TGTTCAGTAC AATGTTCTAG AAAAAGATGA TGATCCTGCA TCACTAACAA 3360 

TATGGGTGCC TATGTTCCAG TCATCTGTGC CAGCAGACTT GCTCATAAAA GAACTTGCAA 3420 

GCATCAACAT ACTAGTGAAG CAGATCTCCA CGCCCAAAGG ACCTTCACTA CGAGTCACGA 3480 

TTAACTCAAG AAGTGCTGTG CTGGCACAAA TGCCTAGTAG TTTTATCATA AGTGCAAATG 3540 

TATCATTAGA TGAAAGAAGC AAATTAGCAT ATGATGTAAC TACACCTTGT GAAATCAAAG 3600 

CATGCAGTCT AACATGCTTA AAAGTAAAAA GTATGTTAAC TACAGTCAAA GATCTTACCA 3660 

TGAAAACATT CAATCCCACT CATGAGATTA TTGCTCTATG TGAATTTGAA AATATTATGA 3720 

CATCAAAAAG AGTAATAATA CCAACCTATC TAAGATCAAT TAGTGTCAAA AACAAGGACC 3780 

TGAACTCACT AGAAAATATA GCAACCACCG AATTCAAAAA TGCTATCACC AATGCGAAAA 3840 

TTATTCCCTA TGCAGGATTA GTATTAGTTA TCACAGTTAC TGACAATAAA GGAGCATTCA 3900 

AATATATCAA GCCACAGAGT CAATTTATAG TAGATCTTGG GGCCTACCTA GAAAAAGAGA 3960 
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GCATATATTA TGTGACTACA AATTGGAAGC ATACAGCTAC ACGTTTTTCA ATCAAACCAC 4020 

TAGAGGATTA AACTTAATTA TCAACACTAA ATGACAGGTC CACATATATC TTCAAACTAT 4080 

ACATTATATC CAAACATCAT GAGCATTTAC ACTACACACT TTTACCATAT AAATCAATCT 4140 

CATTTAAAAT CCAAAATTAC TTCCAGCTAT CATCTGTTAG ACCTAGAGTG CGAATAGGTA 4200 

AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTCAACAACC 4260 

ATTTATACCG CCAATTCAGT ACATATACTA TAAATCTCAA AATGGGAAAT ACATCCATCA 4320 

CAATAGAATT CACAAGCAAA TTTTGGCCTT ATTTTACACT AATACATATG ATCTTAACTC 4380 

TAATCTCTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 4440 

ATAAAACATT CTGCAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAGT 4500 

GTTCTACCAT TATGCTGTGT CAAATTATAA TCTTGTATAT ATAAACAAAC AAATCCAATC 4560 

TTCTCACAGA GTCATGGTGG CGCAAAACCA CGCCAACCAT CAT GAT AG C A TAGAGTAGTT 4620 

ATTTAAAAAT TAACATAATG ATGAATTATT GGTATGAGAT CAGGAACAAC ATTGGGGCAA 4680 

ATGCAGCCAT GTCCAAGCAC AAGAATCGGC GCACTGCCGG GACTCTAGAA AGGACCTGGG 4740 

ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 4800 

TAG CACAAAT AGCACTGTCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 4 860 

CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTT ACAGTTCAAA 4920 

CAATAAAAAA CCACACTGAA AAAAACATCT CCACCTACCT TACTCAAGTC CCACCAGAAA 4980 

GGGTCAACTC ATCCAAACAA CCCACAACCA CATCACCAAT CCACACAAAT TCAGCCACAA 5040 

TATCACCAAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAATCA 5100 

CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAATCACG TTCAAAAAAT CCACCAAAAA 5160 

AACCAAAAGA TGATTACCAT TTTGAAGTGT TCAATTTTGT TCCCTGTAGT ATATGTGGTA 5220 

ATAATCAACT CTGCAAATCC ATCTGCAAAA CAATACCAAG CAACAAACCA AAGAAAAAAC 5280 

CAACCATCAA ACCCACAAAC AAACCAACCA CCAAAACCAC AAACAAAAGA GACCCCAAAA 5340 

CACCAGCCAA AATGCCAAAA AAAGAAATCA TCACCAACCC AGCAAAAAAA CCAACCCTCA 5400 

AGACCACAGA AAGAGACACC AGCATTTCAC AATCCACCGT GCTCGACACA ATCACTCCAA 5460 

AATACACAAT CCAACAGCAA TCCCTCCACT CAACCACCTC CGAAAACACA CCCAGCTCCA 5520 



BNSDOCID:<WO 981 3501 A2> 



SUBSTITUTE SHEET (RULE 26) 



) 



1 



WO 98/13501 



PCT/US97/16718 



- 314 - 



CACAAATACC CACAG CAT CC GAGCCCTCCA CATTAAATCC TAATTAAAAA ACCTAGTCAC 
ATGCTTAGTT ATTCAAAAAC TACATCTTAG CAGAGAACCG TGATCTATCA AGCAAGAACA 
AAATTAAACC TGGGGCAAAT AACCATGGAG TTGCTGATCC ACAGGTCAAG TGCAATCTTC 
CTAACTCTTG CTGTTAATGC ATTGTACCTC ACCTCAAGTC AGAACATAAC TGAGGAGTTT 
TAC CAATCG A CATGTAGTGC AGTTAGCAGA GGTTATTTTA GTGCTTTAAG AACAGGTTGG 
TATACCAGTG TCATAACAAT AGAATTAAGT AATATAAAAG AAACCAAATG CAATGGAACT 
GACACTAAAG TAAAACTTAT AAAACAAGAA TTAGATAAGT ATAAGAATGC AGTAACAGAA 
T TAC AGCTAC TTATGCAAAA CACGCCAGCT GCCAACAACC GGGCCAGAAG AG AAGCAC CA 
CAGTACATGA ACTACACAAT CAATACCACA AAAAACCTAA ATGTATCAAT AAGCAAGAAA 
AGGAAACGAA GATTTCTGGG CTTCTTGTTA GGTGTAGGAT CTGCAATAGC AAGTGGTATA 
GCTGTATCCA AAGTTTTACA CCTTGAAGGA GAAGTGAACA AAATCAAAAA TGCTTTGTTG 
TCTACAAACA AAGCTGTAGT CAGTCTATCA AATGGGGTCA GTGTTTTAAC CAGCAAAGTG 
TTAGATCTCA AGAATTACAT AAATAACCGA ATATTACCCA TAGTAAATCA ACAGAGCTGT 
CGCATCTCCA ACATTGAAAC AGTTATAGAA TTCCAGCAGA AGAATAGCAG ATTGTTGGAA 
ATCAC CAGAG AATTTAGTGT TAATGCAGGT GTAACAACAC CTTTAAGCAC TTACATGTTA 
ACAAACAGTG AGTTACTATC ATTGATCAAT GATATGCCTA TAACAAATGA CCAGAAAAAA 
TTAATGTCAA GCAATGTTCA GATAGTAAGG CAACAAAGTT ATTCTATCAT GTCTATAATA 
AAGGAAGAAG TCCTTGCATA TGTTGTACAG CTACCTATCT ATGGTGTAAT AGATACACCT 
TGCTGGAAAT TACACACATC ACCTCTATGC ACCACCAACA TCAAAGAAGG ATCAAATATT 
TGTTTAACAA GGACTGATAG AGGATGGTAT TGTGATAATG CAGGATCAGT ATCCTTCTTC 
CCACAGGCTG ATACTTGCAA AGTACAGTCC AATCGAGTAT TTTGTGACAC TATGAACAGT 
TTAACATTAC CAAGTGAAGT CAGCCTTTGT AACACTGACA TATTCAATTC CAAGTATGAC 
TGCAAAATTA TGACATCAAA AACAGACATA AGCAGCTCAG TAATTACTTC TCTTGGAGCT 
ATAGTGTCAT GCTATGGAAA AACTAAATGC ACTGCATCCA ATAAAAATCG TGGGATTATA 
AAGACATTTT CTAATGGTTG TGACTATGTG TCAAACAAAG GAGTAGATAC TGTGTCAGTG 
GGCAACACTT TAT ACT AT GT AAACAAGCTG GAAGGCAAAA ACCTTTATGT AAAAGGGGAA 
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CCTATAATAA ATTACTATGA TCCTCTAGTG TTTCCTTCTG ATGAGTTTGA TGCATCAATA 7140 

TCTCAAGTCA ATGAAAAAAT CAATCAAAGT TT AG CTTTT A TTCGTAGATC TGATGAATTA 7200 

CTACATAATG TAAATACTGG CAAATCTACT ACAAATATTA TGATAACTAC AATTATTATA 7260 

GTAATCATTG TAGTATTGTT ATCATTAATA GCTATTGGTT TACTGTTGTA TTGCAAAGCC 7320 

AAAAACACAC CAGTTACACT AAG CAAAGAC CAACTAAGTG GAATCAATAA TATTGCATTC 73 80 

AGCAAATAGA CAAAAAACTA CTTAATCATG TTTCAACAAC AATCTGCTGA CCACCAATCC 7440 

CAAATCAACT TAACAACAAA TATTTCAACA TCATAGCACA GGCTGAATCA TTTCCTCATA 7500 

TCATGCTACC TACACAACTA AGCTAGATCT TCAACTCATA GTTACATAAA AACCCCAAGT 7560 

ATCACAATCA AACACTAAAT CGACACATCA TTCACAAAAT TAACAACTGG GGCAAATATG 7 620 

TCGCGAAGAA ATCCTTGTAA ATTTGAGATT AGAGGTCATT GCTTGAATGG TAGAAGATGT 7680 

CACTACAGTC AT AATTAT T T TGAATGGCCT CCTCATGCAT TACTAGTGAG GCAAAACTTC 774 0 

ATGTTAAACA AGATACTTAA GTCAATGGAC AAAAGCATAG ACACTTTGTC GGAAATAAGT 7800 

GGAGCTGCTG AACTGGATAG AACAGAAGAA TATGCTCTTG GTATAGTTGG AGTGCTAGAG 7860 

AGTTACATAG GATCAATAAA CAACATAACA AAACAATCAG CATGTGTTGC TATGAGTAAA 792 0 

CTTCTTATTG AG ATCAACAG TGATGACATT AAAAAACTGA GAGATAACGA AGAACCCAAT 7980 

TCGCCTAAGA TAAGAGTGTA CAATACTGTT ATATCATACA TTGAGAGCAA TAGAAAAAAC 804 0 

AACAAGCAAA C CATC CAT CT G CT CAAAAG A CTACCAGCAG ACGTGCTGAA GAAGACAATA 8100 

AAGAACACAT TAGATATCCA CAAAAGCATA ACCATAAGCA ACTCAAAAGA GTCAACCGTG 8160 

AATGATCAAA ATGACCAAAC CAAAAATAAT GATATTACCG G AT AAAT AT C CTTGTAGTAT 8220 

ATCATCCATA TTGATTTCAA GTGAAAGCAT GATTGCTACA TTCAATCATA AAAACATATT 8280 

ACAATTTAAC CATAACCATT TGGATAACCA CCAGTGTTTA TT AAAT CAT A TATTTGATGA 8340 

AATTCATTGG ACACCTAAAA ACTTATTAGA TGCCACTCAA CAATTTCTCC AACATCTTAA 8400 

CATCCCTGAA GATATATATA CAGTATATAT ATTAGTGTCA TAATGCTTGA CCATAACAAT 8460 

TTTATATCAT TCAACCATAA AACAACCTTA ATAAGGTTAT GGGACAAAAT GGATCCCATT 8520 

ATTAATGGAA ACTCTG CCAA TGTGTATCTA ACTGATAGTT ATCTAAAAGG TGTTATCTCT 8580 

TTTTCAGAAT GTAATGCTTT AGGGAGTTAC CTTTTTAACG GCCCCTATCT TAAAAATGAT 864 0 
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TATATAGAAC AT G AAAAGTT GAAGTTCTCT GAAAGTGACA GATCAAGAAG AGTACTAGAG 102 60 

TATTACTTGA GAGATAATAA ATTCAATGAA TGCGATCTAT ACAATTGTGT GGTCAATCAA 10320 

AGCTATCTCA ACAACTCTAA CCATGTGGTA TCACTAACTG GTAAAGAAAG AGAGCTCAGT 10380 

GTAGGTAGAA TGTTTGCTAT GCAACCAGGT ATGTTTAGGC AAATTCAAAT CTTAGCAGAG 10440 

AAAATGATAG C CGAAAAT AT TTTACAATTC TTCCCTGAGA GTTT6ACAAG ATATGGTGAT 105 00 

CTAGAGCTTC AAAAGATATT AGAATTAAAA GCAGGAATAA GCAACAAGTC AAATCGTTAT 10560 

AATGATAACT ACAACAATTA TAT C AG T AAA TGTTCTATCA TTACAGACCT TAGCAAATTC 10620 

AATCAAGCAT TTAGATATGA AACATCATGT ATCTGCAGTG ATGTATTAGA TGAACTGCAT 10680 

GGAGTACAAT CTCTGTTCTC TTGGTTGCAT TTAACAATAC CTCTTGTCAC AATAATATGT 10740 

ACATATAGAC ATGCACCTCC TTTTATAAAG GATCATGTTG TTAATCTTAA TAAAGTTGAT 10800 

GAACAAAGTG GATTATACAG ATATCATATG GGTGGTATTG AAGGCTGGTG TCAAAAACTG 10860 

TGGACCATTG AAGCTATATC ATTATTAGAT CTAATATCTC TCAAAGGGAA ATTCTCTATC 10920 

ACAGCTCTAA TAAATGGTGA TAATCAGTCA ATTGATATAA GTAAACCAGT TAGACTTATA 10980 

GAGGGTCAGA CCCATGCTCA AGCAGATTAT TTGTTAGCAT TAAATAGCCT TAAATTGCTA 11040 

TATAAAGAGT ATGCGGGCAT AGGCCACAAG CTCAAGGGAA CAGAGACCTA TAT AT C CCGA 11100 

GATATGCAAT TCATGAGCAA AACAATCCAG CACAATGGAG TGTACTATCC AGCCAGTATC 11160 

AAAAAAGTCC TGAGAGTAGG TCCATGGATA AATACAATAC TTGATGATTT TAAAGTTAGT 11220 

TTAGAATCTA TAGGTAGCTT AACACAGGAG TTAGAATATA GAGGAGAGAG CTTATTATGC 11280 

AGTTTAATAT TTAGGAACAT TTGGTTATAC AATCAAATTG CTTTGCAACT CCGAAATCAT 11340 

GCATTATGTC ACAATAAGCT ATATTTAGAT ATATTGAAAG TATTAAAACA CTTAAAAACT 11400 

TTTTTTAATC TTGATAGTAT TGATATGGCT TTAACATTGT ATATGAATTT GCCTATGCTG 114 60 

TTTGGTGGTG GTGATCCTAA TTTGTTATAT CGAAGCTTTT ATAGGAGAAC TCCAGACTTC 11520 

CTTACAGAAG CTATAGTACA TTCAGTGTTT GTGTTGAGCT ATTATACTGG TCACGATTTA 11580 

CAAGATAAGC TCCAGGATCT TCCAGATGAT AGACTGAACA AATTCTTGAC ATGTATCATC 11640 

ACGTTTGATA AAAATCCCAA TGCCGAGTTT GTAACATTGA TGAGAGATCC ACAGGCTTTA 117 00 

GGGTCTGAAA GGCAAGCAAA AATTACTAGT GAGATTAATA GATTAGCAGT GACAGAAGTC 11760 
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TTAAGTATAG CTCCAAACAA AATATTTTCT AAAAGTGCAC AACATTATAC TACCACTGAG 11820 

ATTGATCTAA ATGATATTAT GCAAAATATA GAACCAACTT ACCCTCATGG ATTAAGAGTT 11880 

GTTTATGAAA GTTTACCTTT TTATAAAGCA GAAAAAATAG TTAATCTTAT ATCAGGAACA 11940 

AAATCCATAA CTAATATACT TGAAAAAACA TCAGCAATAG ATTCAACTGA TATTAATAGG 12000 

GCTACTGATA TGATGAGGAA AAATATAACT TTACTTATAA GGATACTTCC ACTAGATTGT 12060 

AACAAAGACA AAAGAGAGTT ATTAAGTTTA GAAAATCTTA GTATAACTGA ATTAAGCAAG 12120 

TATGTAAGAG AAAGATCTTG GTCGTTATCC AATATAGTAG GAGTAACATC GCCAAGTATT 12180 

ATGTTCACAA TGGACATTAA ATATACAACT AGCACTATAG CCAGTGGTAT AATTATAGAA 12240 

AAATATAATG TTAATAGTTT AACTCGTGGT GAAAGAG G AC CTACTAAGCC ATGGGTAGGT 12300 

TCATCTACGC AGGAGAAAAA AACAATGCCA GTGTACAATA GACAAGTTTT AACCAAAAAG 12360 

CAAAGAGACC AAATAGATTT ATTAGCAAAA TTAGACTGGG TATATGCATC CATAGACAAC 12420 

AAAGATGAAT TCATGGAAGA ACTGAGTACT GGAACACTTG GACTGTCATA TGAGAAAGCC 12480 

AAAAAATTGT TTCCACAATA TCTAAGTGTC AATTATTTAC ACCGCTTAAC AGTCAGTAGT 12540 

AGACCATGTG AATTCCCTGC ATCAATACCA GCTTATAGAA CAACAAATTA TCATTTCGAT 12600 

ACTAGTCCTA TCAACCATGT ATTAACAGAA AAGTATGGAG ATGAAGATAT CGACATTGTG 12660 

TTTCAAAATT GCATAAGTTT TGGTCTTAGC TTAATGTCGG TTGTGGAACA ATTCACAAAC 12720 

ATATGTCCTA ATAGAATTAT TCTCATACCG AAGCTGAATG AGATACATTT GATGAAACCT 12780 

CCTATATTTA CAGGAGATGT TGATATCATC AAGTTGAAGC AAGTGATACA AAAACAGCAC 12840 

ATGTTCCTAC CAGATAAAAT AAGTTTAACC CAATATGTAG AATTATTCCT AAGTAACAAA 12900 

GCACTTAAAT CTGGATCTCA CATCAACTCT AATTTAATAT TAGTACATAA AATGTCTGAT 12960 

TATTTTCATA ATGCTTATAT TTTAAGTACT AATTTAGCTG GACATTGGAT T CTG ATTATT 13 02 0 

CAACTTATGA AGGATTCAAA AGGTATTTTT GAAAAAGATT GGGGAGAGGG GTATATAACT 13080 

GATCATATGT TCATTAATTT GAATGTTTTC TTTAATGCTT ATAAGACTTA TTTGCTATGT 13140 

TTTCATAAAG GTTATGGTAA AGCAAAATTA GAATGTGATA TGAACACTTC AGATCTTCTT 13200 
TGTGTTTTGG AGCTAATAGA CAGTAGCTAC TGGAAATCTA TGTCTAAAGT TTTCCTAGAA 132 60 
CAAAAAGTCA TAAAATACAT AATCAATCAA GACACAAGTT TG CAT AGAAT AAAAGGTTGT 13 320 
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CATAGTTTTA AGTTATGGTT TTTAAAACGC CTTAATAATG CTAAATTTAC CGTATGCCCT 133 80 

TGGGTTGTTA ACATAGATTA TCACCCAACA CACAT GAAAG CTATATTATC TTACATAGAT 13440 

TTAGTTAGAA TGGGGTTAAT AAATGTAGAT AAATTAACCA TTAAAAATAA AAATAAATTC 13500 

AATGATGAAT TTTACACATC AAATCTCTTT TACATTAGTT ATAACTTTTC AGATAACACT 13560 

CATTTGCTAA CAAAACAAAT AAGAATTGCT AATTCAGAAT TAGAAAATAA T T AT AACAAA 13620 

CTATATCACC CAACCCCAGA AACTTTAGAA AATATGTCAT TAATTCCTGT CAAAAGTAAT 13 6 BO 

AATAGTAATA AACCTAAATT TGGTATAAGT GGAAATACCG AAT CTATGAT GACGTCAACA 13 740 

TTCTCCAATA AAACGCATAT TAAATCTTCC GCTGTTATTA CAAGATTCAA TTATAGTAAA 13 800 

CAAGACTTGT ACAATTTATT TCCAATTGTC GTGATAGACA GGATTATAGA TCATTCAGGT 13960 

AATACAGCAA AATCTAACCA ACTCTACACT ACCACTTCAC ATCAGACATC TTTAGTAAGG 13920 

AATAGTGCAT CACTTTATTG CATGCTTCCT TGGCATCATG TCAATAGATT TAACTTTGTA 13980 

TTTAGTTCCA CAGGATGCAA GATCAGTATA GAGTATATTT TAAAAGATCT TAAGATTAAA 14 040 

GACCCCAGTT GTATAGCATT CATAGGTGAA GGAGCTGGTA ACTTATTATT ACGTACAGTA 14100 

GTAGAACTTC ATCCAGACAT AAGATACATT TACAGAAGTT TAAAAGATTG CAATGATCAT 14160 

AGTTTACCTA TTGAATTTCT AAGGTTATAC AACGGGCATA T AAA CAT AG A TTATGGTGAG 14220 

AATTTAACCA TTC CTGCT AC AGATGCAACT AATAACATTC ATTGGTCTTA TTTACATATA 142 80 

AAATTTGCAG AACCTATTAG CATTTTTGTC TGCGATGCTG AATTACCTGT TACAGCCAAT 14340 

TGGAGTAAAA TTATAATTGA ATGGAGTAAG CATGTAAGAA AGTGCAAGTA CTGTTCCTCT 14400 

GTAAATAGAT GCATTTTAAT TGCAAAATAT CATGCCCAAG ATGATATTGA TTTCAAATTA 144 60 

GATAACATTA CTATATTAAA AACTTACGTG TGCCTAGGTA GCAAGTTAAA AGGATCTGAA 14520 

GTTTACTTAG TCCTTACAAT AGGCCCTGCA AATATACTTC CTGTTTTTAA TGTTGTGCAA 14580 

AATGCTAAAT TGATTCTTTC AAGGACTAAA AATTTCATTA TGCCTAAAAA AACTGACAAA 14640 

GAATCTATCG ATGCAAATAT TAAAAGCTTA ATACCTTTCC TTTGTTACCC TATAACAAAA 14700 

AAAGGAATTA AGACTTCATT GTCAAAATTG AAGAGTGTAG TTAGTGGAGA TATATTATCA 14760 

TATTCTATAG CTGGACGTAA TGAAGTATTC AG CAACAAGC TTATAAACCA CAAGCATATG 14820 

AATATCCTAA AATGGCTAGA TCATGTTTTA AACTTTAGAT CAGCTGAACT TAATTACAAT 14880 



BNSDOCID: <WO 9813501 A2> 



SUBSTITUTE SHEET (RULE 26) 



1 

WO 98/13501 



1 

PCT/US97/16718 



- 320 - 



CATTTATATA TGATAGAGTC CACATATCCT TACTTAAGTG AATTGTTAAA CAGTTTAACA 14940 

ACCAATGAGC TCAAGAAGCT GATTAAAATA ACAGGTAGTG TACTATACAA CCTTCCCAAC 15000 

GAACAGTAAC TTAAAACATC ATTAACAAGT TTGATCAAAT TTAGATGCTA ACACATCATA 15060 

ATATTATAGT TATTAAAAAA TATATATGCA AACTTTTCAA TAATTTAGCA TATTGATTCC 15120 

AAAGTTATCA TTTTGGTCTT AAGGGGTTGA ATAAAAATCT AAAACTAACA ATTATACATG 15180 

TGCATTTACA ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15229 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS; 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Met Asp Pro lie He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
! 5 10 15 

Ser Tyr Leu Lye Gly Val He Ser Phe Ser Glu Cya Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 
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Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr lie lie Lys Asp Asp lie 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Asn 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu lie Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Asp Phe Gin Phe He Leu Asn Gin Tyr Gly Cys lie Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He lie Lys Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asn Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn Xle He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu lie Lys Leu Ala Gly Asp Asn Asn Leu 
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405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg lie Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ala Val Arg lie Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Asn Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe lie Tyr Arg lie lie Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala lie Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Lys Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met lie lie Asn Asp Lys Ala lie Ser Pro 
530 535 540 

Pro Lye Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Aen Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met lie Ala Glu Asn lie Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 
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Asn Asn Tyr lie Ser Lys Cya Ser lie He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Lys Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cye His Asn Lye Leu Tyr Leu Asp lie Leu Lys Val Leu Lys 
945 950 955 960 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID:<WO 9613501 A2> 



WO 98/13501 ' PCT/US97/16718 



- 324 - 



His Leu Lye Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Thr 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

lie Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys 11© l le Tnr phe Ab P l Y s Asn Pro Aen Ala Glu Pne Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu lie Asn Arg Leu Ala Val Thr Glu Val Leu Ser lie Ala 
1075 1080 1085 

Pro Aen Lys lie Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

lie Asp Leu Asn Asp lie Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 H20 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 H50 

Lys Thr Ser Ala He Asp Ser Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
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1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Aan Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lye Gin Arg Asp Gin lie Asp Leu Leu Ala Lys Leu Aflp 
1285 1290 1295 

Trp Val Tyr Ala Ser lie Asp Asn Lye Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lye Ala Lye Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu Hie Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro CyB Glu Phe Pro Ala Ser lie Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn lie Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 
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Gly Tyr He Thr Asp Hie Met Phe lie Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe Hie Lye Gly Tyr Gly Lye Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val lie Lye Tyr He He Asn Gin Asp Thr Ser Leu His Arg 
1585 1590 1595 1600 

He Lye Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Aen He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lye Leu Thr He Lya Asn Lys Asn Lye Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asn Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Aen Lya 
1715 1720 1725 

Pro Lys Phe Gly He Ser Gly Asn Thr Glu Ser Met Met Thr Ser Thr 
1730 1735 1740 

Phe Ser Asn Lys Thr His He Lye Ser Ser Ala Val He Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lye Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Arg He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 
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Tyr Thr Thr Thr Ser Hie Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His Hi 8 Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys Xle Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cya Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lye Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 I960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 



Asp Asn He Thr He Leu Lye Thr 
1985 1990 

Lys Gly Ser Glu Val Tyr Leu Val 
2005 

Leu Pro Val Phe Asn Val Val Gin 
2020 



Tyr Val Cys Leu Gly Ser Lys Leu 
1995 2000 

Leu Thr He Gly Pro Ala Asn He 
2010 2015 

Asn Ala Lye Leu He Leu Ser Arg 
2025 2030 



Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lye Ser Val Val Ser Gly 
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2065 



2070 



2075 



2080 



Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 



Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 



Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 



He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 



Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii> MOLECULE TYPE: RNA (genomic) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT C AAT C AG AAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 

ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 
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TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 720 

AAGAAAT CAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 7 80 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 1020 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 1140 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 1200 

GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 12 60 

AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 1320 

TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 1380 

AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 1440 

GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 1500 

CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 1560 

AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 1620 

TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 1680 

CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 1740 

TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CAT CTT ATAG 1800 

ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 1860 

GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 1920 

GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 1980 

TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 2040 

ACCATATATT GAA CAATC C A AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 2100 
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CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC C G AT AACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AG CCGAC AGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACC CCTAG TAAGCTTCAA AGAAGATCTC ACC CCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG AT CT AT CACT 
TGATGATTTT TGATCAGCGA T CAACTC ACT CAGCAATCAA CAACATCAAT AAAACAGACA 



TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA AT T AACAAT A TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CAT C ACT AAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACG C CCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGT AT C ATT A GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AG CAT G CAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAG AT CTTAC5 



2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
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CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA~ TGTGAATTTG AAAATATTAT 3720 

GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 37 80 

TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 3840 

AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 3900 

CAAATATATC AAAC CACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 3960 

GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 4020 

ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 4080 

ACACACTATA TCCAAACATC ATAAACATCT AC AC T AC ACA CTTCATCACA CAAACCAATC 4140 

CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 420 0 

AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 4260 

CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 4320 

ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 43 80 

CTAATCTTTT TACTAATTAT AATCACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 4440 

CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 4500 

AGTTCCACCA TTATGCTGTG T CAAACCAT A ATCCTGTATA TACAAACAAA CAAATCCAAT 4560 

CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 4620 

TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 4680 

AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 4740 

GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT AC AG ATT AAA TTTAAAATCT 4800 

ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 4860 

GCCATAATAT TCATCATCTC TGCCAATCAC AAAGT TACAC TAACAACGGT CACAGTTCAA 4920 

ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 4980 

AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 5040 

ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 5100 

ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 5160 

AAAGATGATT AC CATTTTGA AGTGT T CAAC TTCGTTCCCT G C AG TAT AT G TGGCAACAAT 5220 
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6600 
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ATATTTGTT1: 


6660 
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ATTACCAAGT GAA GTC AG CC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAAGAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTT AG C TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
CAAC T TACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA T AAA CAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
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TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATT CAGGT 
GATGAAAACT CAGTACTTAC AACCATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA AT GCAACAT C CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA AT TT CTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAGGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
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TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AG AG AAAAT G 
AT AG CTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGT CAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATG AAACAT C ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AG ACATG CAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTT CAT G A GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAG CCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA T C ATGCATT A 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
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GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 1152 0 

GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 11580 

AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATT1' 11640 

GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 11700 

GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 11760 

ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 11820 

CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 11880 

GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 11940 

ATAACTAATA TACTTGAAAA AACAT C AGCA ATAGATACAA CTGATATTAA TAGGGCTACT 12000 

GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 12 060 

GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 12120 

AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 12180 

ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 12240 

AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 123 00 

ACGCAGGAGA AAAAAACAAT GC CAGTGT AC AACAGACAAG TTTTAACCAA AAAGCAAAGA 12 360 

GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 12420 

GAATT CAT GG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 12480 

TTGTTTCCAC AATATCTAAG TGT CAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 12540 

TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 12600 

CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 12660 

AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAAC AT AT GT 12720 

CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 127 BO 

TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC! 12840 

CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 12900 

AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC AT AAAATGT C TGATTATTTT 12960 

CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 13020 
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ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG-AGGGGTACAT AACTG A T CAT 13080 

ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 13140 

AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 13200 

TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 132 6 0 

GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 133 2 0 

TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 13380 

GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 13440 

AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 13500 

GAATTTTACA CAT CAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 13 560 

CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 13620 

CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 13680 

AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 13740 

AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 13 800 

TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 13860 

GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 13920 

GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GAT TTAACTT TGTATTTAGT 13960 

TCCACAGGAT G C AAGATC AG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 14040 

AG TTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 14100 

CTTCATCCAG ACATAAGATA CATT TACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 14160 

CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 14220 

ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 14280 

GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 14340 

AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 14400 

AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 1446 0 

ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 14520 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT AC AAAATG CT ' 14580 
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AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14640 

ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 14700 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 1482 0 

CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 14880 

TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 1506 0 

TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 15120 

TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15219 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 
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Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 no 

He Arg Arg Ala lie Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
!45 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp CyB lie 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He lie Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 
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Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn lie Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lya Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He lie Asn Gly Lys Trp lie lie Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu lie Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg lie Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg lie Asn Cys Asn 
435 440 445 

Glu Thr Arg Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe lie Tyr Arg He lie Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala lie Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu lie Thr Glu Asn Asp 
500 505 510 

Leu lie lie Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu ?ro 
515 520 525 

Lys Lys Val Asp Leu Glu Met lie lie Asn Asp Lys Ala lie Ser ;?ro 
530 535 540 

Pro Lys Asp Leu lie Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys :?he 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
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625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyx Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 * 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

lie Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

lie Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly lie Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 
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Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu lie Phe 
915 920 925 

Arg Asn lie Trp Leu Tyr Asn Gin lie Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 H20 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 H3E 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 H45 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 H80 
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Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser lie Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn lie 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser lie Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
12B5 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
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1460 1465 1470 

Asn Ser Asn Leu lie Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly Hie Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly Ha Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 
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Phe Ser Ser Lys Met Hie lie Lye Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Lou Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr lie Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr lie Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His lie Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He lie lie Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

lie Leu lie Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn lie Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu lie Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 
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Leu Pro Val Phe Asp Val Val Gin Asn Ala Lye Leu lie Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe lie Met Pro Lys Lys Thr Asp Lys Glu Ser lie Asp 
2035 2040 2045 

Ala Aen lie Lys Ser Leu lie Pro Phe Leu Cys Tyr Pro lie Thr Lys 
2050 2055 2060 

Lye Qly lie Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp lie Leu Ser Tyr Ser lie Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lya Leu lie Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu lie Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SBQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TT AG AT T AC A AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 
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ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 

TAGACATGTG TTTATTACCA TTTTAGT TAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 

AATAAACTCA CCTAATCAAT CAAACCAT G A GCACTACAAA TGACAACACT ACT ATG CAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 7 20 

AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 7 80 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 1020 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA aTTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 1140 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 1200 

GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 1260 

AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 1320 

TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 13 80 

AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 1440 

GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 1500 

CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 1560 

AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 162 0 

TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 1680 

CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 1740 

TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CAT CTT AT AG 1800 
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ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAG CATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGT CAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCAC CAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGT GCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG AT CT ATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CAT CACT AAC 
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AATATGGGTG CCTATGTTCC AGT CATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATGAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACT ACAGT C A AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAAT AT AT C AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT AC ACGTTTT T CAAT CAAACC 
ACTAGAGGAT TAAACTTAAT TAT CAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAAGTGAA 
CATAAAG CAT TCTG TAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAAT C CAAT 
CTTCT CACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAa' 
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ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 4 980 

AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 5040 

ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 5100 

ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 5160 

AAAGATGATT ACCATTTTGA AGTGTT C AAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 522 0 

CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACC AACC ' 5280 

AT CAAAC C C A CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA. 534 0 

GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 5400 

ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 546 0 

ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 5520 

ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 5580 

TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 564 0 

AAACCTGGGG CAAATAACCA TGGAGTT GAT GAT CCACAAG TCAAGTGCAA TCTTCCTAAC 5700 

TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 5760 

ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 5820 

TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 5880 

TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 5940 

GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 6000 

TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 6060 

ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 6120 

ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 6180 

AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA. 6240 

TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 6300 

CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 6360 

CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 6420 

CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 6480 



8NSDOCiD:<WO 9813501A2> 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 351 - 



GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ~ ATCATGTCTA TAATAAAGGA 6540 

AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 6600 

GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 6660 

AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 672 0 

GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 67 80 

ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 6840 

AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 6900 

GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 6960 

ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 7 020 

CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 7080 

AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAAT AT CTCA 7140 

AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 7 200 

TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 72 60 

CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 732 0 

CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 7380 

ATAGACAAAA AACCACCTGA T CAT GTTT C A ACAACAATCT GCTGACCACC AATCCCAAAT 7 440 

CAACTTACAA CAAATATTTC AACAT CACAG TACAGGCTGA ATCATTTCCT CACATCATGC 7500 

TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 7560 

ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 7620 

AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 7 680 

AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGG CAAAA CTTCATGTTA 7740 

AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT , T.GTCTGAAAT AAGTGGAGCT 7800 

GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 7860 

ATAGGATCTA TAAACAACAT AA CAAAA CAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 7 920 

ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 7980 

AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 8040 
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CAAACCATCC , 


ATCTGCTCAA 


GAGACTACCA 


GCAGACGTGC 


TGAAGAAGAC 


AATAAAGAAC 


8100 


ACATTAGATA 


TCCACAAAAG 


CATAACCATA 


AGCAATCCAA 


AAGAGTCAAC 


TGTGAATGAT 


8160 


PAAAATGACC 


AAACCAAAAA 


TAATGATATT 


ACCGGATAAA 


TATCCTTGTA 


GTATATCATC 


8220 


CATATTGATC 


TCAAGTGAAA 


GCATGGTTGC 


TACATTCAAT 


CATAAAAACA 


TATTACAATT 


8280 


TAAPCATAAC 


TATTTGGATA 


ACCACCAGCG 


TTTATTAAAT 


CATATATTTG 


ATGAAATTCA 


8340 


JL X VJV3/^wA w w X 


AAAAACTTAT 


TAG ATG C CAC 


TCAACAATTT 


CTCCAACATC 


TTAACATCCC 


8400 




T AT AC AGT AT 

X n x f\\— n\7 x mx x 


ATATATTAGT 


GTCATAATGC 


TTGACCATAA 


CGACTCTATG 


8460 




ATAAAACTAT 


TTTGATAAGG 


TTATGGGACA 


AAATGGATCC 


CATTATTAAT 


8520 




CTAATGTGTA 


TCTAACTGAT 


AGTTATTTAA 


AAGGTGTTAT 


CTCTTTTTCA 


8580 


unu X vj A /W\ x u 


CTTTAGGGAG 


TTATCTTTTT 


AACGGCCCTT 


ATCTTAAAAA 


TGATTACACC 


8640 


AAl. X X JUi X X J» 


GTAGACAAAG 


CCCACTACTA 


GAGCATATGA 


ATCTTAAAAA 


ACTAACTATA 


87 00 




TAATATCTAG 


ATATCATAAA 


GGTGAACTGA 


AATTAGAAGA 


ACCAACTTAT 


8760 


TTCCAGTCAT 


TACTTATGAC 


ATATAAAAGT 


ATGTCCTCGT 


CTGAACAAAT 


TGCTACAACT 


8820 


AACTTACTTA 


AAAAAATAAT 


ACGAAGAGCC 


ATAGAAATAA 


GTGATGTAAA 


GGTGTACGCC 


8880 


AT CTTGAAT A 


AACTAGGATT 


AAAGGAAAAG 


GACAGAGTTA 


AGCCCAACAA 


TAATTCAGGT 


8940 


GATGAAAACT 


CAGTACTTAC 


AACTATAATT 


AAAGATGATA 


TACTTTCGGC 


TGTGGAAAAC 


9000 


AATCAATCAT 


AT ACAAATT C 


AGACAAAAGT 


CACTCAGTAA 


AT CAAAAT AT 


CACTATCAAA 


9060 


ACAACACTCT 


TGAAAAAATT 


GATGTGTTCA 


ATGCAACATC 


CTCCATCATG 


GTTAATACAC 


9120 


TGGTTCAATT 


TATATACAAA 


ATTAAATAAC 


AT ATT AA CAC 


AAT AT CG AT C 


AAA X UAw 1 .Ft. 


9180 


AAAAGTCATG 


GGTTTATATT 


AATAGATAAT 


CAAACTTTAA 


GTGGTTTTCA 


GTTTATTTTA 


9240 


AATCAATATG 


GTTGTATCGT 


TT AT CAT AAA 


GGACTCAAAA 


AAATCACAAC 


TACTACTTAC 


9300 


AATCAATTTT 


TGACATGGAA 


AGACATCAGC 


CTTAGCAGAT 


TAAATGTTTG 


CTTAATTACT 


9360 


TGGATAAGTA 


ATTGTTTAAA 


TACATTAAAC 


AAAAG CTTAG 


GGCTGAGATG 


TGGATTCAAT 


9420 


AATGTTGTGT 


TATCACAATT 


ATTTCTTTAT 


GGAGATTGTA 


TACTGAAATT 


ATTTCATAAT 


9480 


GAAGGCTTCT 


ACATAATAAA 


AGAAGTAGAG 


GGATTTATTA 


TGTCTTTAAT 


TCTAAACATA 


9540 


ACAGAAGAAG 


( ATCAATTTAG 


1 GAAACGATTT 


' TATAATAGCA 


TGCTAAATAA 


. CATCACAGAT 


9600 
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GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TG AT CT AG AG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT AT C ATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
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GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA T AC AG AG GAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCT CCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATGACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTC1' 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACAT C AGCA ATAGATACAA C TG AT ATT AA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGGACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAAT TAG AC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA. 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAG CCTG ATG TCGGTTGTQG AACAATTCAC AAA CAT AT GT 
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CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC / ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CAT CAAGT TG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA AC CATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACT CATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTA GAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT T CAATT AT AG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG AT TGCAATG A TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
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GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 14340 

AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 14400 

AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 14460 

ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 14520 

TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 14580 

AAATT G AT AC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14640 

ATCGAT GC AG ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 14700 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 14820 

CTAAAATGGC TAGATCATGT TTTAAATTTT AG AT CAGCTG AACTTAATTA CAATCATTTA 14880 

TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CAT TAT ATT A 15060 

TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 15120 

TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAAC GAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15219 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 2166 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 30: 

Met Asp Pro lie He Aan Gly Asn Ser Ala Aen Val Tyr Leu Thr Asp 
1 5 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
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20 25 30 

Sor Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

lie Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lye Lys Leu 
50 55 60 

Thr lie Thr Gin Ser Leu lie Ser Arg Tyr Hie Lys Gly Glu Leu Lys 
65 70 75 60 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys lie 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 
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Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys lie 
305 310 315 320 

Leu Lye Leu Phe His Aen Glu Gly Phe Tyr lie lie Lya Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu lie Leu Aen lie Thr Glu Glu Asp Gin Phe 
* 340 345 350 

Arg Lye Arg Phe Tyr Aen Ser Met Leu Asn Asn lie Thr Asp Ala Ala 
355 360 365 

lie Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lye Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Aen Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 * 570 575 
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Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lya Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin lie Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lye Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
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850 855 860 

Asp Met Gin Phe Met Ser Lys Thr lie Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser lie Lys Lys Val Leu Arg Val Gly Pro Trp lie Asn Thr 
885 890 895 

lie Leu Asp Asp Phe Lys Val Ser Leu Glu Ser lie Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu lie Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 H35 
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He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lye Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 H75 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 
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Arg tie lie Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp lie Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
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1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu lie Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys lie Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met Hie lie Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro lie Val Val lie 
1765 1770 1775 

Asp Lys lie He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 
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He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asp He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOQX: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 31: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATT C G AAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 

ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 3 60 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 

TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 

AATAAACTCA CCTAATCAAT CAAAC CAT G A GCACTACAAA TGACAACACT ACTATGCAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 720 

AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 7 80 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG T AC CAAAT AC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 1020 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 114 0 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 12 00 

GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 1260 

AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 1320 

TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 1380 

AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 1440 

GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AG CTTGACAT 1500 
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CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAC! 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG G CAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAG CAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCr 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATC J 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC C G AT AACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGAT GAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AG AC AC CTC A GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
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TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAAC CACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATGAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATG CAGGAT TAGTGTTAGT TATCACAGTT ACT G ACAAT A AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 
CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 
AGTTCCACCA TTATGCTGTG TCAAAC CAT A ATCCTGTATA TACAAACAAA CAAATCCAAT 
CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 
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TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC: 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC! 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGT CAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACT G ACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT T GTT AGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
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AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA T AAA CAAC AT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC T CAAGTG AAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CAT T ATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTT TAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACT AAC TATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AAC CAT AATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTT TATTTT A 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
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AATCAATTTT TGACATGGAA AGACATCAG C CTTAGCAGAT TAAATGTTTG CTTAATTACT 9360 

TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 942 0 

AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 9480 

GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 9540 

ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 9600 

GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 9660 

ACAGTGTCTG ATAATATCAT AAATG GT AAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 9720 

TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 97 80 

AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 9840 

TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 9900 

TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 9960 

ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 1002 0 

GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 10080 

CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 10140 

GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 102 00 

GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 10260 

TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 10320 

CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 103 80 

AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 10440 

ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 10500 

CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 10560 

AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 10620 

GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 10680 

CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 10740 

AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 10800 

AGTGGATTAT ACAGATATCA TATGGGTGGT AT TGAGGGC T GGTGTCAAAA ACTGTGGACC 10860 
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ATTGAAGCTA TAT CATT ATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GC ATT GAT AT GGCTTTATCA TTGTAT AT GA ATTTGCCTAT GCTGTTTGGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTC1 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AG C AG AAAAA ATAGTTAATC TT AT AT CAGG AACAAAATCC 
ATAAC TAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
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GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CAT CAAAT C T CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TQATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT T CACATCAG A CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
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TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC: 14040 

AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 14100 

CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 14160 

CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 14220 

ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 14280 

GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 14340 

AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 14400 

AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 14460 

ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 14520 

TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 14580 

AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14 640 

ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 147 00 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 14820 

CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 14 880 

TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 15060 

TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 15120 

TTTTAGTCTT AAGGGGTTAA AT AAAAGT CT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15219 
(2) INFORMATION FOR SEQ ID NO: 32: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32; 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
1 5 io 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu Hie Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
US 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
1^5 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
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245 250 255 

Gly Leu Lys Lye lie Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp lie Ser Leu Ser Arg Leu Asn Val Cys Leu lie Thr Trp lie 
275 280 265 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys lie 
305 310 315 320 

Leu Lys Leu Phe Hie Asn Glu Gly Phe Tyr lie lie Lys Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu lie Leu Asn lie Thr Glu Glu Asp Gin Phe 
340 345 350 

Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn lie Thr Asp Ala Ala 
355 360 365 

lie Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn lie He Asn Gly Lys Trp lie He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
465 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 
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Lye Lys Val Asp Leu Glu Met lie He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr lie Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin lie Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cya Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 
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Lye Phe Ser lie Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lye Pro Val Arg Leu He Glu Gly Gin Thr Hie Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr Xle Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 90S 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin lie Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID:<WO 981 3501 A2> 



WO 98/13501 



PCT/US97/16718 



- 379 - 



1075 1080 1085 

Pro Asn Lys lie Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

lie Asp Leu Asn Asp He Met Gin Asn lie Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Aap Cya 
1170 1175 HBO 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lye Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 
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Tyr Hia Phe Asp Thr Ser Pro lie Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 

1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys lie Ser Leu Thr Gin Tyr 
1445 1450 145E 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 

1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 

1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 

1620 1625 1630 
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Pro Thr His Met Lys Ala He Leu Ser Tyr lie Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr lie Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Aen Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cya He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
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1905 1910 1915 1920 

Tyr Leu Hie lie Lye Phe Ala Glu Pro lie Ser lie Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys lie He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys I*ys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu lie Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu lie Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO; 33: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

ACGGGAAAAA AATG CGT ACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 

ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 4 80 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 54 0 

TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 

AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 720 

AAGAAATCAT CACACACAAA TTCATATACT TG AT AAA CAA TGAATGTATT GTAAGAAAAC 780 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 102 0 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 1140 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 1200 
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GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AAT CAT AAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA AC AA CAT ATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CAT C CACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC T AGG CAT AAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA G AAT C AAT AA AGGG CAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAAC CCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 



1260 
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TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 2 820 

ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 2880 

AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 2940 

TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 3000 

AACTTC CAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 3060 

TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 3120 

TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 3180 

CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 324 0 

AAAAGAACAA GATGGGGCAA ATATG GAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 33 00 

ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 3360 

AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 342 0 

AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 3480 

GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 354 0 

TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 3600 

AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 3660 

CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 3720 

GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 3780 

TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 384 0 

AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 3 900 

CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 3 960 

GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 4 020 

ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 4080 

ACACACTATA TC CAAA CATC AT AAA CA T CT ACACTACACA CTTCATCACA CAAACCAATC 4140 

CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 4200 

AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 4260 

CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 432 0 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 9813501A2> 



1 



1 



WO 98/13501 



PCT/US97/16718 



- 386 - 



ACAATAGAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAAGTGAA 
CATAAAGCAT TCTGTAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAATC CAAT 
CTTCTCACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC AT AG AG TAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTG3 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA AC CACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGG CAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AAT C CAT CTG CAAAACAATA CCAAG CAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCJV 
ATCGACATGT AGTG CAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
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TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAG ATT GT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAA CATC AAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTAT GAC A TCAAAAACAG A CAT AAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGAC CAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
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CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 7500 

TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 7560 

ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 762 0 

AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 7 680 

AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 774 0 

AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 7800 

GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 7 860 

ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 792 0 

ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 7980 

AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 804 0 

CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 810 0 

ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 8160 

CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 8220 

CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 82 80 

TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 8340 

TTGGACACCT AAAAACTTAT TAGATGCCAC T CAACAATT T CTCCAACATC TTAACATCCC 8400 

TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 8460 

TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 8520 

GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 8580 

GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 8640 

AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 87 00 

ACACAGT CAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 8760 

TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 8820 

AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 8880 

ATCTTGAATA AACTAGGATT AAAGGAAAAG G ACAGAGT T A AGCCCAACAA TAATTCAGGT 8940 

GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 9000 
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AATCAATCAT AT AC AAATT C AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 9060 

ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 9120 

TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 9180 

AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 9240 

AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 9300 

AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 9360 

TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 9420 

AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 9480 

GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 9540 

ACAGAAGAAG ATCAATTTAG GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 9600 

GCAGCTATTA AGGCTCAAAA GGAC CT ACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 9660 

ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 9720 

TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTAT A TTTTCTCTTC 9780 

AGAATCTTTG GACAT CCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 9840 

TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 9900 

TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 9960 

ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 10020 

GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 10080 

CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 10140 

GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 10200 

GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 10260 

TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 10320 

CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 103 80 

AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 10440 

ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGT TTG A CAAGATATGG TGATCTAGAG 10500 

CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 10560 
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AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 10620 

GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 10680 

CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 10740 

AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 10800 

AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 10860 

ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 10920 

CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 10980 

CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GC CTTAAATT GTTATATAAA 1104 0 

GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 11100 

CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 11160 

GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 11220 

TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 11280 

ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 1134 0 

TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 114 0 0 

AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 114 60 

GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTC CAGA CTT CCTTACA 1152 0 

GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 1158 0 

AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 11640 

GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 11700 

GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 11760 

ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT AT ACT AC CAC TGAGATTGAT 11820 

CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 11880 

GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 11940 

ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 12000 

GATATGATGA GGAAAAATAT AACTTTACTT AT AAG GAT AC TTCCACTAGA TTGTAACAAA 12060 

GACAAAAG A G AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 12120 
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AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 12180 

ACAATG G AC A TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 12240 

AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 12300 

ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 12360 

GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 12420 

GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 12480 

TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 12540 

TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 12600 

CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 12660 

AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 12720 

CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 12780 

TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA T ACAAAAG CA GCACATGTTC 12840 

CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 12900 

AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 12960 

CATAATGCTT ATATTTTAAG TACTAATTTA GCTG GACATT GGATTCTGAT TATTCAACTT 13020 

ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGAT CAT 13080 

ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 13140 

AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 13200 

TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 13 260 

GTGATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 1332 0 

TTTAAGTTGT GGTTTTTAAA ACGCCTTGAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 13380 

GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 13440 

AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 13 500 

GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 13 560 

CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 13 620 

CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 13680 
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AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 13740 

AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 13800 

TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 13860 

GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAAT AG T 13920 

GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 13980 

TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 14040 

AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 14100 

CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 14160 

CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTT A 1422 0 

ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 14280 

GCAGAACCTA TTAGCAT CTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGG AG T 1434 0 

AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 14400 

AG AT GCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 14460 

ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 14520 

TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 14580 

AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14640 

ATCGATGCAG TTATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 14700 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 14820 

CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 14880 

TACATGATAG AGT CCACAT A TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTJ\ 15060 

TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATC\ 15120 

TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT ' 15219 
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(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Asp Pro lie He Asn Gly Asn Ser Ala A Bin Val Tyr Leu Thr Asp 
1 5 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Aon Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

lie Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu lie Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

lie Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 
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Leu Mat Cys Ser Met Gin His Pro Pro Ser Trp Leu lie His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Aen He Leu Thr Gin Tyr Arg Ser Aen 
210 215 220 

Glu Val Lye Ser His Gly Phe He Leu He Asp Aen Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys lie Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Aen Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr lie He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Aan He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
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465 470 475 



480 



Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys lie Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
7 05 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 
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He Lye Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lye Gly 
785 790 795 BOO 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
80S 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr rryr 
865 870 875 »80 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn ^hr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
^45 950 955 *60 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 
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Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu lie Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro Hie 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp lie Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 



Ser Ser Thr Gin Glu Lys Lys Thr 
1265 1270 

Leu Thr Lys Lys Gin Arg Asp Gin 
1285 

Trp Val Tyr Ala Ser He Asp Asn 



Met Pro Val Tyr Asn Arg Gin Val 
1275 1280 

He Asp Leu Leu Ala Lys Leu Asp 
1290 1295 

Lys Asp Glu Phe Met Glu Glu Leu 
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1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser lie Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro lie Asn. His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp lie Asp lie Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn lie Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp lie He Lys Leu Lys Gin Val lie 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys lie Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu CyB Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 
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Gin Lys Val lie Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lye Arg Leu Asp 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr lie Ser Tyr Asn Phe 
1665 1670 1675 ^1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lye Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 
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Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp lie Arg 
1860 1865 1870 

Tyr lie Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
«°5 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cya Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 i960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys leu 
iSSS 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Val He Lys Ser Leu Ho Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
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2130 



2135 



2140 



Thr Asn Glu Leu Lys Lys Leu lie Lys lie Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 



(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
(8 } TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CATATCACTC ACTCTGGGAT GGAG 24 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TCAGAACATC AAGCACCGCC 20 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
ACAGTCAAGA CTGAGATGAG 20 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
AAGAGTCAGA TACATGTGGA 2 0 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ACATGAATCA GCCTAAAGTC 20 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CCGAAAGAGT TCCTGCGTTA CGACC 
(2) INFORMATION FOR SEQ ID NO: 41: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

CAGTCCACAC AAGTACCAGG 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
* (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GTCAGAAGCT GTGGACCATC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



AATATTGCTA CAACAATOGC 



20 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
ACTCTTCATT CCTAGACTGG 20 
(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GTCCAATTAT GACTATGAAC 20 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



AGAACAGACA TGAAGCTTGC 



20 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CCAACAAGGA ATGCTTCTAG 20 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
ACAGCACTAT CTATGATTGA CCTGG 2 5 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
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GCAACATGGT TTACACATGC 



20 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGATTGAGAG TTGATCCAGG 20 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
AGGAGATACT TAAACTAAGC 20 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
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TAAGCTTATG CCTTTCAGCG 20 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 
TTAACGGACC TAAGCTGTGC 20 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GAAACAGATT ATTATGACGG 20 
(2) INFORMATION FOR SEQ ID NO:55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
CGGGCTATCT AGGTGAACTT CAGG ' 24 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
ATTTGGATAT GGAATATGAG 20 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

ACTCAACTGA ACTACCAGTG 20 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 
CA) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
< D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
AAGAACATCA TGTATTTCAG 20 
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(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
TTATCAACGC ACTGCTCATG 2 0 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
ATTTTCAGCA ATCACTTGGC ATGCC 25 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GCCTCTGTGC AAACAAGCTG 20 
(2) INFORMATION FOR SEQ ID NO: 62: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
TCTCTAGTTA CTCTAGCAGC 20 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
AGGTCGTTGT TTGTGAGGAG 2 0 

(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TCGTCCTCTT CTTTACTGTC 2 0 

(2) INFORMATION FOR SEQ ID NO: 65: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND ED NESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 
CCGTCCTCGA GCTAGCCTCG 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
CTCCTCCAGG CTCACATTGG 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGGTTGGTAC ATAGCTCTGC 
(2) INFORMATION FOR SEQ ID NO: 68: 
(i> SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
CACCCATCTG ATATTTCCCT GATGG 25 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TGGTTGACAG TACAAATCTG 2 0 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

CTGAAATGGG AAGATTGTGC 20 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
AGCAATCTAC ACTGCCTACC 2 0 

(2) INFORMATION FOR SEQ ID NO:72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
TCACAGATGA TTCAATTATC 2 0 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
GATCCTAGAT ATAAGTTCTC 20 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDKDNKSS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 



ACCAAACAAA GTTGGGTAAG G 



21 



(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GGGGGATCCA TCCCTAATCC TGCTCTTGTC CC 32 
(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
GATTCCTCTG ATGGCTCCAC 2 0 

(2) INFORMATION FOR SEQ ID NO: 77: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 



<C) STRANDEDNESS: single 
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<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: 



TAACAGTCAA GGAGACCAAA O 



21 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GGGAAGCTTA ACCCTAATCC TGCCCTAGGT GG 32 
(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 
ACCAGACAAA GCTGGGAATA GA 22 
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What is claimed is: 

1. An isolated, recombinantly-generated, 
attenuated, nonsegmented, negative-sense, single 
stranded RNA virus of the Order Mononegavi rales having 
at least one attenuating mutation in the 3* genomic 
promoter region and having at least one attenuating 
mutation in the RNA polymerase gene. 

2 . The virus of Claim 1 wherein the virus 
is from the Family Paramyxoviridae . 

3. The virus of Claim 2 wherein the virus 
is from the Subfamily Paramyxovirinae . 

4. The virus of Claim 3 wherein the virus 
is from the Genus Morbillivirus - 

5. The virus of Claim 4 wherein the virus 
is measles virus* 

6. The measles virus of Claim 5 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 26 (A -» T) , nucleotide 42 (A 
-> T or A — » C) and nucleotide 96 (G -> 

A) , where these nucleotides are 
presented in positive strand, 
antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 331 (isoleucine 

threonine) , 1409 (alanine -> 
threonine), 1624 (threonine -^alanine), 
1649 (arginine -> methionine) , 1717 
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(aspartic acid -> alanine) , 1936 
(histidine -> tyrosine) , 2074 
(glutamine -> arginine) and 2114 
(arginine -» lysine) . 

7 . The virus of Claim 3 wherein the virus 
is from the Genus Paramyxovirus « 

8. The virus of Claim 7 wherein the virus 
is human parainf luenzae virus type 3 (PIV-3) . 

9. The PIV-3 of Claim 8 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 23 (T -> C) , nucleotide 24 (C 
-> T) , nucleotide 28 (G -> T) and 
nucleotide 45 (T — » A) , where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 942 (tyrosine — > 
histidine) , 992 (leucine 
phenylalanine) , 1292 (leucine — > 
phenylalanine) , and 1558 (threonine -> 
isoleucine) . 

10. The virus of Claim 3 wherein the virus 
is from the Genus Rubula virus . 

11. The virus of Claim 2 wherein the virus 
is from the Subfamily Pneumovir inae . 

12 . The virus of Claim 11 wherein the virus 
is from the Genus Pneumovirus . 
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13. The virus of Claim 12 wherein the virus 
is human respiratory syncytial virus (RSV) subgroup B. 

14. The virus of Claim 13 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 4 (C -» G) and the insertion 
of an additional A in the stretch of A's 
at nucleotides 6-11, where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 353 (arginine -> 
lysine) , 451 (lysine arginine) , 1229 
(aspartic acid -> asparagine) , 202 9 
(threonine — > isoleucine) and 2050 
(asparagine —> aspartic acid) . 

15. The virus of Claim 1 wherein the virus 
is from the Family Rhabdoviridae . 

16 . The virus of Claim 1 wherein the virus 
is from the Family Filoviridae. 

17. A vaccine comprising an isolated, 
recombinantly- generated, attenuated, nonsegmented, 
negative -sense, single stranded RNA virus of the Order 
Mononegavirales according to Claim 1 and a 
physiologically acceptable carrier. 

18. The vaccine of Claim 17 comprising a 
measles virus according to Claim 5 and a 
physiologically acceptable carrier. 
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19. The vaccine of Claim 18 comprising a 
measles virus according to Claim 6 and a 
physiologically acceptable carrier. 

20. The vaccine of Claim 17 comprising a 
PIV-3 according to Claim 8 and a physiologically 
acceptable carrier, 

21. The vaccine of Claim 20 comprising a 
PIV-3 according to Claim 9 and a physiologically 
acceptable carrier . 

22 * The vaccine of Claim 17 comprising an 
RSV subgroup B according to Claim 13 and a 
physiologically acceptable carrier. 

23. The vaccine of Claim 22 comprising an 
RSV subgroup B according to Claim 14 and a 
physiologically acceptable carrier. 

24 . A method for immunizing an individual to 
induce protection against a nonsegmented, negative - 
sense, single stranded RNA virus of the Order 
Mononegavirales which comprises administering to the 
individual the vaccine of Claim 17 . 

25. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 18. 

26. The method of Claim 25 wherein the 
vaccine is the vaccine of Claim 19 . 

27. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 20. 

28. The method of Claim 27 wherein the 
vaccine is the vaccine of Claim 21. 

29. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 22. 

30. The method of Claim 29 wherein the 
vaccine is the vaccine of Claim 23. 
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31* An isolated nucleic acid molecule 
comprising a measles virus sequence in positive strand, 
an ti genomic message sense selected from the group 
consisting of 1977 wild-type strain (SEQ ID NO:3), 1983 
wild- type strain (SEQ ID NO: 5) where the nucleotide 
2499 is G or C, Montefiore wild-type strain (SEQ ID 
N0:7), Rubeovax™ vaccine strain (SEQ ID N0:9), where 
the nucleotide 2143 is T or C, Moraten vaccine strain 
(SEQ ID NO:ll), Schwarz vaccine strain (SEQ ID N0:11), 
where the nucleotide 4917 is C and the nucleotide 4924 
is C, and Zagreb vaccine strain (SEQ ID NO: 13), and the 
complementary genomic sequences thereof. 

32. An isolated nucleic acid molecule 
comprising a PIV-3 sequence in positive strand, 
antigenomic message sense selected from the group 
consisting of cp45 vaccine strain grown in fetal rhesus 
lung cells (SEQ ID NO:19) and cp45 vaccine strain grown 
in Vero cells (SEQ ID NO: 21), and the complementary 
genomic sequences thereof . 

33. A composition which comprises a 
transcription vector comprising an isolated nucleic 
acid molecule encoding a genome or antigenome of a 
nonsegmented, negative-sense, single stranded RNA virus 
of the Order Mononegavirales having at least one 
attenuating mutation in the 3 ' genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene, together with at least one expression 
vector which comprises at least one isolated nucleic 
acid molecule encoding the trans-acting proteins 
necessary for encapsidation, transcription and 
replication, whereby upon expression an infectious 
attenuated virus is produced. 
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34. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim 5 and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans -acting proteins N, P and L. 

35. The composition of Claim 34 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim 6 . 

36. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 8 and 
the at least one expression vector comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins NP, P and L. 

37. The composition of Claim 36 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 9. 

38. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 13 and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans-acting proteins N, P, h and M2 - 

39. The composition of Claim 38 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 14. 

40. A method for producing infectious 
attenuated non segmented, negative- sense, single 
stranded RNA virus of the Order Mononegavi rales which 
comprises transforming or transfecting host cells with 
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the at least two vectors of Claim 33 and culturing the 
host cells under conditions which permit the co- 
expression of these vectors so as to produce the 
infectious attenuated virus. 

41. The method of Claim 40 wherein the virus 
is the measles virus of Claim 5. 

42. The method of Claim 41 wherein the virus 
is the measles virus of Claim 6, 

43. The method of Claim 40 wherein the virus 
is the PIV-3 of Claim 8. 

44. The method of Claim 43 wherein the virus 
is the PIV-3 of Claim 9. 

45. The method of Claim 40 wherein the virus 
is the RSV subgroup B of Claim 13 . 

46. The method of Claim 45 wherein the virus 
is the RSV subgroup B of Claim 14 . 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 4-6,18,19,25,26,31,34,35,41,42 (complete) and 1-3, 
17,24,33,40 (partially) 



Recombinants generated, attenuated Morbi 1 1 i virus having at 
least one attenuating mutation in the 3' genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; nucleic acid 
molecules encoding said virus; composition comprising said 
nucleic acid together with an expression vector providing 
all trans-acting proteins necessary for the production of 
said attenuated virus and a method of producing it. 



2. Claims: 7-9,20,21,27,28,32,36,37,43,44 (complete) and 1-3, 
17,24,33,40 (partially) 



Recombinantly generated, attenuated Paramyxovirus having at 
least one attenuating mutation in the 3* genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; nucleic acid 
molecules encoding said virus; composition comprising said 
nucleic acid together with an expression vector providing 
all trans-acting proteins necessary for the production of 
said attenuated virus and a method of producing it. 



3. Claims: 10 (complete) and 1-3,17,24,33,40 (partially) 

Recombinantly generated, attenuated Rubulavirus having at 
least one attenuating mutation in the 3' genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; composition 
comprising a nucleic acid molecule encoding said virus 
together with an expression vector providing all 
trans-acting proteins necessary for the production of said 
attenuated virus and a method of producing it. 



4. Claims: 11-14,22,23,29,30,38,39,45,46 (complete) and 1,2, 
17,24,33,40 (partially) 



Recombinantly generated, attenuated Pneumovirinae having at 
least one attenuating mutation in the 3* genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; composition 
comprising a nucleic acid molecule encoding said virus 
together with an expression vector providing all 
trans-acting proteins necessary for the production of said 
attenuated virus and a method of producing it. 
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5. Claims: 15 (complete) and 1,17,24,33,40 (partially) 

Recombinantly generated, attenuated Rhabdoviridae having at 
least one attenuating mutation in the 3' genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; composition 
comprising a nucleic acid molecule encoding said virus 
together with an expression vector providing all 
trans-acting proteins necessary for the production of said 
attenuated virus and a method of producing it. 



6. Claims: 16 (complete) and 1,17,24,33,40 (partially) 

Recombinantly generated, attenuated Filoviridae having at 
least one attenuating mutation in the 3' genomic promoter 
region and at least one attenuating mutation in the RNA 
polymerase gene; vaccine comprising it; composition 
comprising a nucleic acid molecule encoding said virus 
together with an expression vector providing all 
trans-acting proteins necessary for the production of said 
attenuated virus and a method of producing it. 
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